Friday, June 6, 2025
No menu items!
HomeNatureA multidimensional distributional map of future reward in dopamine neurons

A multidimensional distributional map of future reward in dopamine neurons

  • Schultz, W., Dayan, P. & Montague, P. R. A neural substrate of prediction and reward. Science 275, 1593–1599 (1997).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Sutton, R. S. & Barto, A. G. Reinforcement Learning: An Introduction 2nd edn (MIT Press, 2018).


    Google Scholar
     

  • Montague, P. R., Dayan, P. & Sejnowski, T. J. A framework for mesencephalic dopamine systems based on predictive hebbian learning. J. Neurosci. 16, 1936–1947 (1996).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Dabney, W., Rowland, M., Bellemare, M. & Munos, R. Distributional reinforcement learning with quantile regression. In Proc. 32nd AAAI Conference on Artificial Intelligence 2892–2901 (AAAI, 2018).

  • Lyle, C., Bellemare, M. G. & Castro, P. S. A comparative analysis of expected and distributional reinforcement learning. In Proc. 33rd AAAI Conference on Artificial Intelligence 4504–4511 (AAAI, 2019).

  • Bellemare, M. G., Dabney, W. & Rowland, M. Distributional Reinforcement Learning (MIT Press, 2023).

    Book 

    Google Scholar
     

  • Muller, T. H. et al. Distributional reinforcement learning in prefrontal cortex. Nat. Neurosci. 27, 403–408 (2024).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Avvisati, R. et al. Distributional coding of associative learning in discrete populations of midbrain dopamine neurons. Cell Rep. 43, 114080 (2024).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Dabney, W. et al. A distributional code for value in dopamine-based reinforcement learning. Nature 577, 671–675 (2020).

    Article 
    ADS 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Bellemare, M. G., Dabney, W. & Munos, R. A distributional perspective on reinforcement learning. Proc. Mach. Learn. Res. 70, 449–458 (2017).


    Google Scholar
     

  • Martin, J., Lyskawinski, M., Li, X. & Englot, B. Stochastically dominant distributional reinforcement learning. Proc. Mach. Learn. Res. 119, 6745–6754 (2020).


    Google Scholar
     

  • Théate, T. & Ernst, D. Risk-sensitive policy with distributional reinforcement learning. Algorithms 16, 325 (2023).

    Article 

    Google Scholar
     

  • Puterman, M. L. Markov Decision Processes: Discrete Stochastic Dynamic Programming (Wiley, 2014) .

  • Kurth-Nelson, Z. & Redish, A. D. Temporal-difference reinforcement learning with distributed representations. PLoS ONE 4, e7362 (2009).

    Article 
    ADS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Fedus, W., Gelada, C., Bengio, Y., Bellemare, M. G. & Larochelle, H. Hyperbolic discounting and learning over multiple horizons. Preprint at https://doi.org/10.48550/arXiv.1902.06865 (2019).

  • Janner, M., Mordatch, I. & Levine, S. Gamma-models: generative temporal difference learning for infinite-horizon prediction. In Advances in Neural Information Processing Systems 33 (NeurIPS 2020) (eds Larochelle, H. et al.) 1724–1735 (Curran Associates, 2020).

  • Thakoor, S. et al. Generalised policy improvement with geometric policy composition. Proc. Mach. Learn. Res. 162, 21272–21307 (2022).


    Google Scholar
     

  • Shankar, K. H. & Howard, M. W. A scale-invariant internal representation of time. Neural Comput. 24, 134–193 (2012).

    Article 
    MathSciNet 
    PubMed 

    Google Scholar
     

  • Tano, P., Dayan, P. & Pouget, A. A local temporal difference code for distributional reinforcement learning. In Advances in Neural Information Processing Systems 33 (NeurIPS 2020) (eds Larochelle, H. et al.) 13662–1367 (Curran Associates, 2020).

  • Tiganj, Z., Gershman, S. J., Sederberg, P. B. & Howard, M. W. Estimating scale-invariant future in continuous time. Neural Comput. 31, 681–709 (2019).

    Article 
    MathSciNet 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Barlow, H. B & Rosenblith, W. A. in Sensory Communication (ed. Rosenblith, W. A.) 217–234 (MIT Press, 1961).

  • Laughlin, S. A simple coding procedure enhances a neuron’s information capacity. Z. Naturforsch. C 36, 910–912 (1981).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Simoncelli, E. P. & Olshausen, B. A. Natural image statistics and neural representation. Annu. Rev. Neurosci. 24, 1193–1216 (2001).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Fairhall, A. L., Lewen, G. D., Bialek, W. & de Ruyter Van Steveninck, R. R. Efficiency and ambiguity in an adaptive neural code. Nature 412, 787–792 (2001).

    Article 
    ADS 
    CAS 
    PubMed 

    Google Scholar
     

  • Lau, B. & Glimcher, P. W. Dynamic response-by-response models of matching behavior in rhesus monkeys. J. Exp. Anal. Behav. 84, 555–579 (2005).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Rudebeck, P. H. et al. A role for primate subgenual cingulate cortex in sustaining autonomic arousal. Proc. Natl Acad. Sci. USA 111, 5391–5396 (2014).

    Article 
    ADS 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Cash-Padgett, T., Azab, H., Yoo, S. B. M. & Hayden, B. Y. Opposing pupil responses to offered and anticipated reward values. Anim. Cogn. 21, 671–684 (2018).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Ganguli, D. & Simoncelli, E. P. Efficient sensory encoding and bayesian inference with heterogeneous neural populations. Neural Comput. 26, 2103–2134 (2014).

    Article 
    MathSciNet 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Louie, K. Asymmetric and adaptive reward coding via normalized reinforcement learning. PLoS Comput. Biol. 18, e1010350 (2022).

    Article 
    ADS 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Schütt, H. H., Kim, D. & Ma, W. J. Reward prediction error neurons implement an efficient code for reward. Nat. Neurosci. 27, 1333–1339 (2024).

    Article 
    PubMed 

    Google Scholar
     

  • Kahneman, D. & Tversky, A. Prospect theory: an analysis of decision under risk. Econometrica 47, 363–391 (1979).

    Article 
    MathSciNet 

    Google Scholar
     

  • Dayan, P. Improving generalization for temporal difference learning: the successor representation. Neural Comput. 5, 613–624 (1993).

    Article 

    Google Scholar
     

  • Brunec, I. K. & Momennejad, I. Predictive representations in hippocampal and prefrontal hierarchies. J. Neurosci. 42, 299–312 (2022).

  • Yamada, H., Tymula, A., Louie, K. & Glimcher, P. W. Thirst-dependent risk preferences in monkeys identify a primitive form of wealth. Proc. Natl Acad. Sci. USA 110, 15788–15793 (2013).

    Article 
    ADS 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Stauffer, W. R., Lak, A. & Schultz, W. Dopamine reward prediction error responses reflect marginal utility. Curr. Biol. 24, 2491–2500 (2014).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Kacelnik, A. & Bateson, M. Risky theories—the effects of variance on foraging decisions. Am. Zool. 36, 402–434 (1996).

    Article 

    Google Scholar
     

  • Yoshimura, J., Ito, H., Miller III, D. G. & Tainaka, K.-I. Dynamic decision-making in uncertain environments: I. The principle of dynamic utility. J. Ethol. 31, 101–105 (2013).

    Article 

    Google Scholar
     

  • Kagel, J. H., Green, L. & Caraco, T. When foragers discount the future: constraint or adaptation? Anim. Behav. 34, 271–283 (1986).

    Article 

    Google Scholar
     

  • Soares, S., Atallah, B. V. & Paton, J. J. Midbrain dopamine neurons control judgment of time. Science 354, 1273–1277 (2016).

    Article 
    ADS 
    CAS 
    PubMed 

    Google Scholar
     

  • Behrens, T. E. J., Woolrich, M. W., Walton, M. E. & Rushworth, M. F. S. Learning the value of information in an uncertain world. Nat. Neurosci. 10, 1214–1221 (2007).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Soltani, A. & Izquierdo, A. Adaptive learning under expected and unexpected uncertainty. Nat. Rev. Neurosci. 20, 635–644 (2019).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Nassar, M. R. et al. Rational regulation of learning dynamics by pupil-linked arousal systems. Nat. Neurosci. 15, 1040–1046 (2012).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Sharpe, M. J. et al. Dopamine transients do not act as model-free prediction errors during associative learning. Nat. Commun. 11, 106 (2020).

    Article 
    ADS 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Engelhard, B. et al. Specialized coding of sensory, motor and cognitive variables in VTA dopamine neurons. Nature 570, 509–513 (2019).

    Article 
    ADS 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Sharpe, M. J. et al. Dopamine transients are sufficient and necessary for acquisition of model-based associations. Nat. Neurosci. 20, 735–742 (2017).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Jeong, H. et al. Mesolimbic dopamine release conveys causal associations. Science 378, eabq6740 (2022).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Coddington, L. T. & Dudman, J. T. The timing of action determines reward prediction signals in identified midbrain dopamine neurons. Nat. Neurosci. 21, 1563–1573 (2018).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Tesauro, G. Practical issues in temporal difference learning. Mach. Learn. 8, 257–277 (1992).

    Article 

    Google Scholar
     

  • Bornstein, A. M. & Daw, N. D. Multiplicity of control in the basal ganglia: computational roles of striatal subregions. Curr. Opin. Neurobiol. 21, 374–380 (2011).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Yin, H. H. et al. Dynamic reorganization of striatal circuits during the acquisition and consolidation of a skill. Nat. Neurosci. 12, 333–341 (2009).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Hamid, A. A., Frank, M. J. & Moore, C. I. Wave-like dopamine dynamics as a mechanism for spatiotemporal credit assignment. Cell 184, 2733–2749 (2021).

    Article 

    Google Scholar
     

  • Cruz, B. F. et al. Action suppression reveals opponent parallel control via striatal circuits. Nature 607, 521–526 (2022).

    Article 
    ADS 
    CAS 
    PubMed 

    Google Scholar
     

  • Lee, R. S., Sagiv, Y., Engelhard, B., Witten, I. B. & Daw, N. D. A feature-specific prediction error model explains dopaminergic heterogeneity. Nat. Neurosci. 27, 1574–1586 (2024).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Takahashi, Y. K. et al. Dopaminergic prediction errors in the ventral tegmental area reflect a multithreaded predictive model. Nat. Neurosci. 26, 830–839 (2023).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Balsam, P. D. & Gallistel, C. R. Temporal maps and informativeness in associative learning. Trends Neurosci. 32, 73–78 (2009).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • International Brain Laboratory. Behavior, Appendix 1: IBL protocol for headbar implant surgery in mice. Figshare https://doi.org/10.6084/m9.figshare.11634726.v5 (2020).

  • Uchida, N. & Mainen, Z. F. Speed and accuracy of olfactory discrimination in the rat. Nat. Neurosci. 6, 1224–1229 (2003).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Lopes, G. et al. Bonsai: an event-based framework for processing and controlling data streams. Front. Neuroinform. 9, 7 (2015).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Siegle, J. H. et al. Open ephys: an open-source, plugin-based platform for multichannel electrophysiology. J. Neural Eng. 14, 045003 (2017).

    Article 
    ADS 
    PubMed 

    Google Scholar
     

  • Bell, A. J. & Sejnowski, T. J. An information-maximization approach to blind separation and blind deconvolution. Neural Comput. 7, 1129–1159 (1995).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Hyvarinen, A. Fast and robust fixed-point algorithms for independent component analysis. IEEE Trans. Neural Netw. 10, 626–634 (1999).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Kvitsiani, D. et al. Distinct behavioural and network correlates of two interneuron types in prefrontal cortex. Nature 498, 363–366 (2013).

    Article 
    ADS 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Hill, D. N., Mehta, S. B. & Kleinfeld, D. Quality metrics to accompany spike sorting of extracellular signals. J. Neurosci. 31, 8699–8705 (2011).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Ludvig, E. A., Sutton, R. S. & Kehoe, E. J. Stimulus representation and the timing of reward-prediction errors in models of the dopamine system. Neural Comput. 20, 3034–3054 (2008).

    Article 
    PubMed 

    Google Scholar
     

  • Rowland, M. et al. Statistics and samples in distributional reinforcement learning. Proc. Mach. Learn. Res. 97, 5528–5536 (2019).


    Google Scholar
     

  • Newey, W. K. & Powell, J. L. Asymmetric least squares estimation and testing. Econometrica 55, 819–847 (1987).

    Article 
    MathSciNet 

    Google Scholar
     

  • Brunel, N. & Nadal, J.-P. Mutual information, fisher information, and population coding. Neural Comput. 10, 1731–1757 (1998).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Glimcher, P. W. & Fehr, E. Neuroeconomics: Decision Making and the Brain (Academic Press, 2013).

  • Eshel, N., Tian, J., Bukwich, M. & Uchida, N. Dopamine neurons share common response function for reward prediction error. Nat. Neurosci. 19, 479–486 (2016).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Cohen, J. Y., Haesler, S., Vong, L., Lowell, B. B. & Uchida, N. Neuron-type-specific signals for reward and punishment in the ventral tegmental area. Nature 482, 85–88 (2012).

    Article 
    ADS 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Lee, K. et al. Temporally restricted dopaminergic control of reward-conditioned movements. Nat. Neurosci. 23, 209–216 (2020).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Stauffer, W. R., Lak, A., Kobayashi, S. & Schultz, W. Components and characteristics of the dopamine reward utility signal. J. Comp. Neurol. 524, 1699–1711 (2016).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Kobayashi, S. & Schultz, W. Influence of reward delays on responses of dopamine neurons. J. Neurosci. 28, 7837–7846 (2008).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Mathis, A., Mamidanna, P. & Cury, K. M. Deeplabcut: markerless pose estimation of user-defined body parts with deep learning. Nat. Neurosci. 21, 1281–1289 (2018).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Yagle, A. E. Regularized matrix computations. Preprint at https://api.semanticscholar.org/CorpusID:7810635 (2005).

  • Chakravarti, N. Isotonic median regression: a linear programming approach. Math. Oper. Res. 14, 303–308 (1989).

    Article 
    MathSciNet 

    Google Scholar
     

  • Picheny, V., Moss, H., Torossian, L. & Durrande, N. Bayesian quantile and expectile optimisation. Proc. Mach. Learn. Res. 180, 1623–1633 (2022).


    Google Scholar
     

  • Sousa, M. et al. A multidimensional distributional map of future reward in dopamine neurons. Figshare https://doi.org/10.6084/m9.figshare.28390151.v1 (2025).

  • RELATED ARTICLES

    Most Popular

    Recent Comments