A multidimensional distributional map of future reward in dopamine neurons

Schultz, W., Dayan, P. & Montague, P. R. A neural substrate of prediction and reward. Science 275, 1593–1599 (1997).

Article
CAS
PubMed

Google Scholar

Sutton, R. S. & Barto, A. G. Reinforcement Learning: An Introduction 2nd edn (MIT Press, 2018).

Google Scholar

Montague, P. R., Dayan, P. & Sejnowski, T. J. A framework for mesencephalic dopamine systems based on predictive hebbian learning. J. Neurosci. 16, 1936–1947 (1996).

Article
CAS
PubMed
PubMed Central

Google Scholar

Dabney, W., Rowland, M., Bellemare, M. & Munos, R. Distributional reinforcement learning with quantile regression. In Proc. 32nd AAAI Conference on Artificial Intelligence 2892–2901 (AAAI, 2018).

Lyle, C., Bellemare, M. G. & Castro, P. S. A comparative analysis of expected and distributional reinforcement learning. In Proc. 33rd AAAI Conference on Artificial Intelligence 4504–4511 (AAAI, 2019).

Bellemare, M. G., Dabney, W. & Rowland, M. Distributional Reinforcement Learning (MIT Press, 2023).

Book

Google Scholar

Muller, T. H. et al. Distributional reinforcement learning in prefrontal cortex. Nat. Neurosci. 27, 403–408 (2024).

Article
CAS
PubMed
PubMed Central

Google Scholar

Avvisati, R. et al. Distributional coding of associative learning in discrete populations of midbrain dopamine neurons. Cell Rep. 43, 114080 (2024).

Article
CAS
PubMed
PubMed Central

Google Scholar

Dabney, W. et al. A distributional code for value in dopamine-based reinforcement learning. Nature 577, 671–675 (2020).

Article
ADS
CAS
PubMed
PubMed Central

Google Scholar

Bellemare, M. G., Dabney, W. & Munos, R. A distributional perspective on reinforcement learning. Proc. Mach. Learn. Res. 70, 449–458 (2017).

Google Scholar

Martin, J., Lyskawinski, M., Li, X. & Englot, B. Stochastically dominant distributional reinforcement learning. Proc. Mach. Learn. Res. 119, 6745–6754 (2020).

Google Scholar

Théate, T. & Ernst, D. Risk-sensitive policy with distributional reinforcement learning. Algorithms 16, 325 (2023).

Article

Google Scholar

Puterman, M. L. Markov Decision Processes: Discrete Stochastic Dynamic Programming (Wiley, 2014) .

Kurth-Nelson, Z. & Redish, A. D. Temporal-difference reinforcement learning with distributed representations. PLoS ONE 4, e7362 (2009).

Article
ADS
PubMed
PubMed Central

Google Scholar

Fedus, W., Gelada, C., Bengio, Y., Bellemare, M. G. & Larochelle, H. Hyperbolic discounting and learning over multiple horizons. Preprint at https://doi.org/10.48550/arXiv.1902.06865 (2019).

Janner, M., Mordatch, I. & Levine, S. Gamma-models: generative temporal difference learning for infinite-horizon prediction. In Advances in Neural Information Processing Systems 33 (NeurIPS 2020) (eds Larochelle, H. et al.) 1724–1735 (Curran Associates, 2020).

Thakoor, S. et al. Generalised policy improvement with geometric policy composition. Proc. Mach. Learn. Res. 162, 21272–21307 (2022).

Google Scholar

Shankar, K. H. & Howard, M. W. A scale-invariant internal representation of time. Neural Comput. 24, 134–193 (2012).

Article
MathSciNet
PubMed

Google Scholar

Tano, P., Dayan, P. & Pouget, A. A local temporal difference code for distributional reinforcement learning. In Advances in Neural Information Processing Systems 33 (NeurIPS 2020) (eds Larochelle, H. et al.) 13662–1367 (Curran Associates, 2020).

Tiganj, Z., Gershman, S. J., Sederberg, P. B. & Howard, M. W. Estimating scale-invariant future in continuous time. Neural Comput. 31, 681–709 (2019).

Article
MathSciNet
PubMed
PubMed Central

Google Scholar

Barlow, H. B & Rosenblith, W. A. in Sensory Communication (ed. Rosenblith, W. A.) 217–234 (MIT Press, 1961).

Laughlin, S. A simple coding procedure enhances a neuron’s information capacity. Z. Naturforsch. C 36, 910–912 (1981).

Article
CAS
PubMed

Google Scholar

Simoncelli, E. P. & Olshausen, B. A. Natural image statistics and neural representation. Annu. Rev. Neurosci. 24, 1193–1216 (2001).

Article
CAS
PubMed

Google Scholar

Fairhall, A. L., Lewen, G. D., Bialek, W. & de Ruyter Van Steveninck, R. R. Efficiency and ambiguity in an adaptive neural code. Nature 412, 787–792 (2001).

Article
ADS
CAS
PubMed

Google Scholar

Lau, B. & Glimcher, P. W. Dynamic response-by-response models of matching behavior in rhesus monkeys. J. Exp. Anal. Behav. 84, 555–579 (2005).

Article
PubMed
PubMed Central

Google Scholar

Rudebeck, P. H. et al. A role for primate subgenual cingulate cortex in sustaining autonomic arousal. Proc. Natl Acad. Sci. USA 111, 5391–5396 (2014).

Article
ADS
CAS
PubMed
PubMed Central

Google Scholar

Cash-Padgett, T., Azab, H., Yoo, S. B. M. & Hayden, B. Y. Opposing pupil responses to offered and anticipated reward values. Anim. Cogn. 21, 671–684 (2018).

Article
PubMed
PubMed Central

Google Scholar

Ganguli, D. & Simoncelli, E. P. Efficient sensory encoding and bayesian inference with heterogeneous neural populations. Neural Comput. 26, 2103–2134 (2014).

Article
MathSciNet
PubMed
PubMed Central

Google Scholar

Louie, K. Asymmetric and adaptive reward coding via normalized reinforcement learning. PLoS Comput. Biol. 18, e1010350 (2022).

Article
ADS
CAS
PubMed
PubMed Central

Google Scholar

Schütt, H. H., Kim, D. & Ma, W. J. Reward prediction error neurons implement an efficient code for reward. Nat. Neurosci. 27, 1333–1339 (2024).

Article
PubMed

Google Scholar

Kahneman, D. & Tversky, A. Prospect theory: an analysis of decision under risk. Econometrica 47, 363–391 (1979).

Article
MathSciNet

Google Scholar

Dayan, P. Improving generalization for temporal difference learning: the successor representation. Neural Comput. 5, 613–624 (1993).

Article

Google Scholar

Brunec, I. K. & Momennejad, I. Predictive representations in hippocampal and prefrontal hierarchies. J. Neurosci. 42, 299–312 (2022).

Yamada, H., Tymula, A., Louie, K. & Glimcher, P. W. Thirst-dependent risk preferences in monkeys identify a primitive form of wealth. Proc. Natl Acad. Sci. USA 110, 15788–15793 (2013).

Article
ADS
CAS
PubMed
PubMed Central

Google Scholar

Stauffer, W. R., Lak, A. & Schultz, W. Dopamine reward prediction error responses reflect marginal utility. Curr. Biol. 24, 2491–2500 (2014).

Article
CAS
PubMed
PubMed Central

Google Scholar

Kacelnik, A. & Bateson, M. Risky theories—the effects of variance on foraging decisions. Am. Zool. 36, 402–434 (1996).

Article

Google Scholar

Yoshimura, J., Ito, H., Miller III, D. G. & Tainaka, K.-I. Dynamic decision-making in uncertain environments: I. The principle of dynamic utility. J. Ethol. 31, 101–105 (2013).

Article

Google Scholar

Kagel, J. H., Green, L. & Caraco, T. When foragers discount the future: constraint or adaptation? Anim. Behav. 34, 271–283 (1986).

Article

Google Scholar

Soares, S., Atallah, B. V. & Paton, J. J. Midbrain dopamine neurons control judgment of time. Science 354, 1273–1277 (2016).

Article
ADS
CAS
PubMed

Google Scholar

Behrens, T. E. J., Woolrich, M. W., Walton, M. E. & Rushworth, M. F. S. Learning the value of information in an uncertain world. Nat. Neurosci. 10, 1214–1221 (2007).

Article
CAS
PubMed

Google Scholar

Soltani, A. & Izquierdo, A. Adaptive learning under expected and unexpected uncertainty. Nat. Rev. Neurosci. 20, 635–644 (2019).

Article
CAS
PubMed
PubMed Central

Google Scholar

Nassar, M. R. et al. Rational regulation of learning dynamics by pupil-linked arousal systems. Nat. Neurosci. 15, 1040–1046 (2012).

Article
CAS
PubMed
PubMed Central

Google Scholar

Sharpe, M. J. et al. Dopamine transients do not act as model-free prediction errors during associative learning. Nat. Commun. 11, 106 (2020).

Article
ADS
CAS
PubMed
PubMed Central

Google Scholar

Engelhard, B. et al. Specialized coding of sensory, motor and cognitive variables in VTA dopamine neurons. Nature 570, 509–513 (2019).

Article
ADS
CAS
PubMed
PubMed Central

Google Scholar

Sharpe, M. J. et al. Dopamine transients are sufficient and necessary for acquisition of model-based associations. Nat. Neurosci. 20, 735–742 (2017).

Article
CAS
PubMed
PubMed Central

Google Scholar

Jeong, H. et al. Mesolimbic dopamine release conveys causal associations. Science 378, eabq6740 (2022).

Article
CAS
PubMed
PubMed Central

Google Scholar

Coddington, L. T. & Dudman, J. T. The timing of action determines reward prediction signals in identified midbrain dopamine neurons. Nat. Neurosci. 21, 1563–1573 (2018).

Article
CAS
PubMed
PubMed Central

Google Scholar

Tesauro, G. Practical issues in temporal difference learning. Mach. Learn. 8, 257–277 (1992).

Article

Google Scholar

Bornstein, A. M. & Daw, N. D. Multiplicity of control in the basal ganglia: computational roles of striatal subregions. Curr. Opin. Neurobiol. 21, 374–380 (2011).

Article
CAS
PubMed
PubMed Central

Google Scholar

Yin, H. H. et al. Dynamic reorganization of striatal circuits during the acquisition and consolidation of a skill. Nat. Neurosci. 12, 333–341 (2009).

Article
CAS
PubMed
PubMed Central

Google Scholar

Hamid, A. A., Frank, M. J. & Moore, C. I. Wave-like dopamine dynamics as a mechanism for spatiotemporal credit assignment. Cell 184, 2733–2749 (2021).

Article

Google Scholar

Cruz, B. F. et al. Action suppression reveals opponent parallel control via striatal circuits. Nature 607, 521–526 (2022).

Article
ADS
CAS
PubMed

Google Scholar

Lee, R. S., Sagiv, Y., Engelhard, B., Witten, I. B. & Daw, N. D. A feature-specific prediction error model explains dopaminergic heterogeneity. Nat. Neurosci. 27, 1574–1586 (2024).

Article
CAS
PubMed

Google Scholar

Takahashi, Y. K. et al. Dopaminergic prediction errors in the ventral tegmental area reflect a multithreaded predictive model. Nat. Neurosci. 26, 830–839 (2023).

Article
CAS
PubMed
PubMed Central

Google Scholar

Balsam, P. D. & Gallistel, C. R. Temporal maps and informativeness in associative learning. Trends Neurosci. 32, 73–78 (2009).

Article
CAS
PubMed
PubMed Central

Google Scholar

International Brain Laboratory. Behavior, Appendix 1: IBL protocol for headbar implant surgery in mice. Figshare https://doi.org/10.6084/m9.figshare.11634726.v5 (2020).

Uchida, N. & Mainen, Z. F. Speed and accuracy of olfactory discrimination in the rat. Nat. Neurosci. 6, 1224–1229 (2003).

Article
CAS
PubMed

Google Scholar

Lopes, G. et al. Bonsai: an event-based framework for processing and controlling data streams. Front. Neuroinform. 9, 7 (2015).

Article
PubMed
PubMed Central

Google Scholar

Siegle, J. H. et al. Open ephys: an open-source, plugin-based platform for multichannel electrophysiology. J. Neural Eng. 14, 045003 (2017).

Article
ADS
PubMed

Google Scholar

Bell, A. J. & Sejnowski, T. J. An information-maximization approach to blind separation and blind deconvolution. Neural Comput. 7, 1129–1159 (1995).

Article
CAS
PubMed

Google Scholar

Hyvarinen, A. Fast and robust fixed-point algorithms for independent component analysis. IEEE Trans. Neural Netw. 10, 626–634 (1999).

Article
CAS
PubMed

Google Scholar

Kvitsiani, D. et al. Distinct behavioural and network correlates of two interneuron types in prefrontal cortex. Nature 498, 363–366 (2013).

Article
ADS
CAS
PubMed
PubMed Central

Google Scholar

Hill, D. N., Mehta, S. B. & Kleinfeld, D. Quality metrics to accompany spike sorting of extracellular signals. J. Neurosci. 31, 8699–8705 (2011).

Article
CAS
PubMed
PubMed Central

Google Scholar

Ludvig, E. A., Sutton, R. S. & Kehoe, E. J. Stimulus representation and the timing of reward-prediction errors in models of the dopamine system. Neural Comput. 20, 3034–3054 (2008).

Article
PubMed

Google Scholar

Rowland, M. et al. Statistics and samples in distributional reinforcement learning. Proc. Mach. Learn. Res. 97, 5528–5536 (2019).

Google Scholar

Newey, W. K. & Powell, J. L. Asymmetric least squares estimation and testing. Econometrica 55, 819–847 (1987).

Article
MathSciNet

Google Scholar

Brunel, N. & Nadal, J.-P. Mutual information, fisher information, and population coding. Neural Comput. 10, 1731–1757 (1998).

Article
CAS
PubMed

Google Scholar

Glimcher, P. W. & Fehr, E. Neuroeconomics: Decision Making and the Brain (Academic Press, 2013).

Eshel, N., Tian, J., Bukwich, M. & Uchida, N. Dopamine neurons share common response function for reward prediction error. Nat. Neurosci. 19, 479–486 (2016).

Article
CAS
PubMed
PubMed Central

Google Scholar

Cohen, J. Y., Haesler, S., Vong, L., Lowell, B. B. & Uchida, N. Neuron-type-specific signals for reward and punishment in the ventral tegmental area. Nature 482, 85–88 (2012).

Article
ADS
CAS
PubMed
PubMed Central

Google Scholar

Lee, K. et al. Temporally restricted dopaminergic control of reward-conditioned movements. Nat. Neurosci. 23, 209–216 (2020).

Article
CAS
PubMed
PubMed Central

Google Scholar

Stauffer, W. R., Lak, A., Kobayashi, S. & Schultz, W. Components and characteristics of the dopamine reward utility signal. J. Comp. Neurol. 524, 1699–1711 (2016).

Article
CAS
PubMed

Google Scholar

Kobayashi, S. & Schultz, W. Influence of reward delays on responses of dopamine neurons. J. Neurosci. 28, 7837–7846 (2008).

Article
CAS
PubMed
PubMed Central

Google Scholar

Mathis, A., Mamidanna, P. & Cury, K. M. Deeplabcut: markerless pose estimation of user-defined body parts with deep learning. Nat. Neurosci. 21, 1281–1289 (2018).

Article
CAS
PubMed

Google Scholar

Yagle, A. E. Regularized matrix computations. Preprint at https://api.semanticscholar.org/CorpusID:7810635 (2005).

Chakravarti, N. Isotonic median regression: a linear programming approach. Math. Oper. Res. 14, 303–308 (1989).

Article
MathSciNet

Google Scholar

Picheny, V., Moss, H., Torossian, L. & Durrande, N. Bayesian quantile and expectile optimisation. Proc. Mach. Learn. Res. 180, 1623–1633 (2022).

Google Scholar

Sousa, M. et al. A multidimensional distributional map of future reward in dopamine neurons. Figshare https://doi.org/10.6084/m9.figshare.28390151.v1 (2025).

A multidimensional distributional map of future reward in dopamine neurons

US Supreme Court allows NIH to cut $2 billion in research grants

How a fraudulent scientist faked his career and other cautionary tales: Books in brief

Net zero needs AI — five actions to realize its promise

Most Popular

Matt Rife Defends Sydney Sweeney Aganist Online Hate

First Look & Release Details

Some lawyers and bankers say the Intel deal may face legal challenges as the CHIPS Act may not allow the US government to convert...

Eric Dickerson Says NFL Sabotaged Shedeur Sanders In Draft

Recent Comments

ABOUT US

POPULAR POSTS

Matt Rife Defends Sydney Sweeney Aganist Online Hate

First Look & Release Details

Some lawyers and bankers say the Intel deal may face legal challenges as the CHIPS Act may not allow the US government to convert...

POPULAR CATEGORY