W Schultz 1997 + Montague-Dayan-Sejnowski 1996 dopamine reward-prediction-error; modern modern foundational + 2024 deep-RL TD-learning generalize.
W Schultz 1997 + Montague-Dayan-Sejnowski 1996 dopamine reward-prediction-error; modern modern foundational + 2024 deep-RL TD-learning generalize.