Bellemare, M. G., Dabney, W. & Rowland, M. Distributional Reinforcement Learning (MIT Press, 2023).
Schultz, W., Dayan, P. & Montague, P. R. A neural substrate of prediction and reward. Science 275, 1593–1599 (1997).
Dabney, W. et al. A distributional code for value in dopamine-based reinforcement learning. Nature 577, 671–675 (2020).
Shin, J. H., Kim, D. & Jung, M. W. Differential coding of reward and movement information in the dorsomedial striatal direct and indirect pathways. Nat. Commun. 9, 404 (2018).
Nonomura, S. et al. Monitoring and updating of action selection for goal-directed behavior through the striatal direct and indirect pathways. Neuron 99, 1302–1314.e5 (2018).
Hikida, T., Kimura, K., Wada, N., Funabiki, K. & Nakanishi, S. Distinct roles of synaptic transmission in direct and indirect striatal pathways to reward and aversive behavior. Neuron 66, 896–907 (2010).
Kravitz, A. V., Tye, L. D. & Kreitzer, A. C. Distinct roles for direct and indirect pathway striatal neurons in reinforcement. Nat. Neurosci. 15, 816–818 (2012).
Tai, L.-H., Lee, A. M., Benavidez, N., Bonci, A. & Wilbrecht, L. Transient stimulation of distinct subpopulations of striatal neurons mimics changes in action value. Nat. Neurosci. 15, 1281–1289 (2012).
Cruz, B. F. et al. Action suppression reveals opponent parallel control via striatal circuits. Nature 607, 521–526 (2022).
Floresco, S. B. The nucleus accumbens: an interface between cognition, emotion, and action. Annu. Rev. Psychol. 66, 25–52 (2015).
Sutton, R. S. & Barto, A. G. Reinforcement Learning: An Introduction Vol. 2 (MIT Press, 2018).
Yagishita, S. et al. A critical time window for dopamine actions on the structural plasticity of dendritic spines. Science 345, 1616–1620 (2014).
Iino, Y. et al. Dopamine D2 receptors in discrimination learning and spine enlargement. Nature 579, 555–560 (2020).
Lee, S. J. et al. Cell-type-specific asynchronous modulation of PKA by dopamine in learning. Nature 590, 451–456 (2021).
Ito, M. & Doya, K. Distinct neural representation in the dorsolateral, dorsomedial, and ventral parts of the striatum during fixed- and free-choice tasks. J. Neurosci. 35, 3499–3514 (2015).
Shin, E. J. et al. Robust and distributed neural representation of action values. eLife 10, e53045 (2021).
Hattori, R., Danskin, B., Babic, Z., Mlynaryk, N. & Komiyama, T. Area-specificity and plasticity of history-dependent value coding during learning. Cell 177, 1858–1872.e15 (2019).
Hirokawa, J., Vaughan, A., Masset, P., Ott, T. & Kepecs, A. Frontal cortex neuron types categorically encode single decision variables. Nature 576, 446–451 (2019).
Ottenheimer, D. J., Hjort, M. M., Bowen, A. J., Steinmetz, N. A. & Stuber, G. D. A stable, distributed code for cue value in mouse cortex during reward learning. eLife 12, RP84604 (2023).
Watabe-Uchida, M. & Uchida, N. Multiple dopamine systems: weal and woe of dopamine. Cold Spring Harb. Symp. Quant. Biol. 83, 83–95 (2018).
de Jong, J. W. et al. A neural circuit mechanism for encoding aversive stimuli in the mesolimbic dopamine system. Neuron 101, 133–151.e7 (2019).
Tsutsui-Kimura, I. et al. Distinct temporal difference error signals in dopamine axons in three regions of the striatum in a decision-making task. eLife 9, e62390 (2020).
Engelhard, B. et al. Specialized coding of sensory, motor and cognitive variables in VTA dopamine neurons. Nature 570, 509–513 (2019).
Akiti, K. et al. Striatal dopamine explains novelty-induced behavioral dynamics and individual variability in threat prediction. Neuron 110, 3789–3804.e9 (2022).
Lee, R. S., Sagiv, Y., Engelhard, B., Witten, I. B. & Daw, N. D. A feature-specific prediction error model explains dopaminergic heterogeneity. Nat. Neurosci. 27, 1574–1586 (2024).
Jeong, H. et al. Mesolimbic dopamine release conveys causal associations. Science 378, eabq6740 (2022).
Coddington, L. T., Lindo, S. E. & Dudman, J. T. Mesolimbic dopamine adapts the rate of learning from action. Nature 614, 294–302 (2023).
Costa, V. D., Dal Monte, O., Lucas, D. R., Murray, E. A. & Averbeck, B. B. Amygdala and ventral striatum make distinct contributions to reinforcement learning. Neuron 92, 505–517 (2016).
St Onge, J. R. & Floresco, S. B. Dopaminergic modulation of risk-based decision making. Neuropsychopharmacology 34, 681–697 (2009).
Zalocusky, K. A. et al. Nucleus accumbens D2R cells signal prior outcomes and control risky decision-making. Nature 531, 642–646 (2016).
Ma, W. J., Beck, J. M., Latham, P. E. & Pouget, A. Bayesian inference with probabilistic population codes. Nat. Neurosci. 9, 1432–1438 (2006).
Walker, E. Y. et al. Studying the neural representations of uncertainty. Nat. Neurosci. 26, 1857–1867 (2023).
Bellemare, M. G., Dabney, W. & Munos, R. A distributional perspective on reinforcement learning. In Proc. 34th International Conference on Machine Learning (eds Precup, D. & Teh, Y. W.) 449–458 (PMLR, 2017).
Wurman, P. R. et al. Outracing champion Gran Turismo drivers with deep reinforcement learning. Nature 602, 223–228 (2022).
Rothenhoefer, K. M., Hong, T., Alikaya, A. & Stauffer, W. R. Rare rewards amplify dopamine responses. Nat. Neurosci. 24, 465–469 (2021).
Avvisati, R. et al. Distributional coding of associative learning in discrete populations of midbrain dopamine neurons. Cell Rep. 43, 114080 (2024).
Sousa, M., Bujalski, P., Cruz, B. F., Louie, K. & Paton, J. J. Dopamine neurons encode a multidimensional probabilistic map of future reward. Preprint at bioRxiv https://doi.org/10.1101/2023.11.12.566727 (2023).
Muller, T. H. et al. Distributional reinforcement learning in prefrontal cortex. Nat. Neurosci. 27, 403–408 (2024).
Rowland, M. et al. Statistics and samples in distributional reinforcement learning. In Proc. 36th International Conference on Machine Learning (eds Chaudhuri, K. & Salakhutdinov, R.) 5528–5536 (PMLR, 2019).
Tano, P., Dayan, P. & Pouget, A. A local temporal difference code for distributional reinforcement learning. In Proc. Advances in Neural Information Processing Systems 33 (eds Larochelle, H. et al.) 13662–13673 (NeurIPS, 2020).
Louie, K. Asymmetric and adaptive reward coding via normalized reinforcement learning. PLoS Comput. Biol. 18, e1010350 (2022).
Schütt, H. H., Kim, D. & Ma, W. J. Reward prediction error neurons implement an efficient code for reward. Nat. Neurosci. 27, 1333–1339 (2024).
O’Neill, M. & Schultz, W. Coding of reward risk by orbitofrontal neurons is mostly distinct from coding of reward value. Neuron 68, 789–800 (2010).
Monosov, I. E. & Hikosaka, O. Selective and graded coding of reward uncertainty by neurons in the primate anterodorsal septal region. Nat. Neurosci. 16, 756–762 (2013).
White, J. K. & Monosov, I. E. Neurons in the primate dorsal striatum signal the uncertainty of object–reward associations. Nat. Commun. 7, 12735 (2016).
Yanike, M. & Ferrera, V. P. Representation of outcome risk and action in the anterior caudate nucleus. J. Neurosci. 34, 3279–3290 (2014).
Yamada, K. & Toda, K. Pupillary dynamics of mice performing a Pavlovian delay conditioning task reflect reward-predictive signals. Front. Syst. Neurosci. 16, 1045764 (2022).
Tian, J. et al. Distributed and mixed information in monosynaptic inputs to dopamine neurons. Neuron 91, 1374–1389 (2016).
Stringer, C. et al. Spontaneous behaviors drive multidimensional, brainwide activity. Science 364, 255 (2019).
Musall, S., Kaufman, M. T., Juavinett, A. L., Gluf, S. & Churchland, A. K. Single-trial neural dynamics are dominated by richly varied movements. Nat. Neurosci. 22, 1677–1686 (2019).
Hoyer, P. & Hyvärinen, A. Interpreting neural response variability as Monte Carlo sampling of the posterior. In Proc. Advances in Neural Information Processing Systems 15 (eds Becker, S. et al.) 293–300 (MIT Press, 2002).
Orbán, G., Berkes, P., Fiser, J. & Lengyel, M. Neural variability and sampling-based probabilistic representations in the visual cortex. Neuron 92, 530–543 (2016).
Bernardi, S. et al. The geometry of abstraction in the hippocampus and prefrontal cortex. Cell 183, 954–967.e21 (2020).
Lowet, A. S., Zheng, Q., Matias, S., Drugowitsch, J. & Uchida, N. Distributional reinforcement learning in the brain. Trends Neurosci. 43, 980–997 (2020).
Gerfen, C. R. & Surmeier, D. J. Modulation of striatal projection systems by dopamine. Annu. Rev. Neurosci. 34, 441–466 (2011).
Faust, T. W., Mohebi, A. & Berke, J. D. Reward expectation selectively boosts the firing of accumbens D1+ neurons during motivated approach. Preprint at bioRxiv https://doi.org/10.1101/2023.09.02.556060 (2023).
Martiros, N., Kapoor, V., Kim, S. E. & Murthy, V. N. Distinct representation of cue-outcome association by D1 and D2 neurons in the ventral striatum’s olfactory tubercle. eLife 11, e75463 (2022).
Nishioka, T. et al. Error-related signaling in nucleus accumbens D2 receptor-expressing neurons guides inhibition-based choice behavior in mice. Nat. Commun. 14, 2284 (2023).
Kupchik, Y. M. et al. Coding the direct/indirect pathways by D1 and D2 receptors is not valid for accumbens projections. Nat. Neurosci. 18, 1230–1232 (2015).
Such, F. P. et al. An Atari model zoo for analyzing, visualizing, and comparing deep reinforcement learning agents. In Proc. 28th International Joint Conference on Artificial Intelligence (ed. Kraus, S.) 3260–3267 (IJCAI, 2019).
Collins, A. G. E. & Frank, M. J. Opponent actor learning (OpAL): modeling interactive effects of striatal dopamine on reinforcement learning and choice incentive. Psychol. Rev. 121, 337–366 (2014).
Gjorgjieva, J., Sompolinsky, H. & Meister, M. Benefits of pathway splitting in sensory coding. J. Neurosci. 34, 12127–12144 (2014).
Ichinose, T. & Habib, S. ON and OFF signaling pathways in the retina and the visual system. Front. Ophthalmol. 2, 989002 (2022).
Poulin, J.-F., Gaertner, Z., Moreno-Ramos, O. A. & Awatramani, R. Classification of midbrain dopamine neurons using single-cell gene expression profiling approaches. Trends Neurosci. 43, 155–169 (2020).
Wenliang, L. K. et al. Distributional Bellman operators over mean embeddings. In Proc. 41st International Conference on Machine Learning (eds Salakhutdinov, R. et al.) 52839–52868 (PMLR, 2024).
Mikhael, J. G. & Bogacz, R. Learning reward uncertainty in the basal ganglia. PLoS Comput. Biol. 12, e1005062 (2016).
Cui, G. et al. Concurrent activation of striatal direct and indirect pathways during action initiation. Nature 494, 238–242 (2013).
Markowitz, J. E. et al. The striatum organizes 3D behavior via moment-to-moment action selection. Cell 174, 44–58 (2018).
Tan, B. et al. Dynamic processing of hunger and thirst by common mesolimbic neural ensembles. Proc. Natl Acad. Sci. USA 119, e2211688119 (2022).
Bar-Gad, I., Morris, G. & Bergman, H. Information processing, dimensionality reduction and reinforcement learning in the basal ganglia. Prog. Neurobiol. 71, 439–473 (2003).
Barth-Maron, G. et al. Distributed distributional deterministic policy gradients. In Proc. 6th International Conference on Learning Representations 4855–4870 (ICLR, 2018).
Brown, V. M. et al. Reinforcement learning disruptions in individuals with depression and sensitivity to symptom change following cognitive behavioral therapy. JAMA Psychiatry 78, 1113–1122 (2021).
Gueguen, M. C. M., Schweitzer, E. M. & Konova, A. B. Computational theory-driven studies of reinforcement learning and decision-making in addiction: what have we learned? Curr. Opin. Behav. Sci. 38, 40–48 (2021).
Paxinos, G. & Franklin, K. B. J. Paxinos and Franklin’s the Mouse Brain in Stereotaxic Coordinates (Academic Press, 2019).
Gong, S. et al. A gene expression atlas of the central nervous system based on bacterial artificial chromosomes. Nature 425, 917–925 (2003).
Gong, S. et al. Targeting Cre recombinase to specific neuron populations with bacterial artificial chromosome constructs. J. Neurosci. 27, 9817–9823 (2007).
Gerfen, C. R., Paletzki, R. & Heintz, N. GENSAT BAC cre-recombinase driver lines to study the functional organization of cerebral cortical and basal ganglia circuits. Neuron 80, 1368–1383 (2013).
Govorunova, E. G., Sineshchekov, O. A., Janz, R., Liu, X. & Spudich, J. L. Natural light-gated anion channels: a family of microbial rhodopsins for advanced optogenetics. Science 349, 647–650 (2015).
Li, N. et al. Spatiotemporal constraints on optogenetic inactivation in cortical circuits. eLife 8, e48622 (2019).
Cohen, J. Y., Haesler, S., Vong, L., Lowell, B. B. & Uchida, N. Neuron-type-specific signals for reward and punishment in the ventral tegmental area. Nature 482, 85–88 (2012).
Thiele, S. L., Warre, R. & Nash, J. E. Development of a unilaterally-lesioned 6-OHDA mouse model of Parkinson’s disease. J. Vis. Exp. 60, e3234 (2012).
Dana, H. et al. High-performance calcium sensors for imaging activity in neuronal populations and microcompartments. Nat. Methods 16, 649–657 (2019).
Klapoetke, N. C. et al. Independent optical excitation of distinct neural populations. Nat. Methods 11, 338–346 (2014).
Lee, J. & Sabatini, B. L. Striatal indirect pathway mediates exploration via collicular competition. Nature 599, 645–649 (2021).
Uchida, N. & Mainen, Z. F. Speed and accuracy of olfactory discrimination in the rat. Nat. Neurosci. 6, 1224–1229 (2003).
Pavlov, I. P. Conditioned Reflexes: An Investigation of the Physiological Activity of the Cerebral Cortex (Oxford Univ. Press, 1927).
Jun, J. J. et al. Fully integrated silicon probes for high-density recording of neural activity. Nature 551, 232–236 (2017).
Steinmetz, N. A. et al. Neuropixels 2.0: a miniaturized high-density probe for stable, long-term brain recordings. Science 372, eabf4588 (2021).
Pachitariu, M., Sridhar, S., Pennington, J. & Stringer, C. Spike sorting with Kilosort4. Nat. Methods 21, 914–921 (2024).
Zhou, Z. C. et al. Deep-brain optical recording of neural dynamics during behavior. Neuron 111, 3716–3738 (2023).
Pachitariu, M. et al. Suite2p: beyond 10,000 neurons with standard two-photon microscopy. Preprint at bioRxiv https://doi.org/10.1101/061507 (2017).
Friedrich, J., Zhou, P. & Paninski, L. Fast online deconvolution of calcium imaging data. PLoS Comput. Biol. 13, e1005423 (2017).
Lopes, G. et al. Bonsai: an event-based framework for processing and controlling data streams. Front. Neuroinform. 9, 7 (2015).
Pisanello, M. et al. Tailoring light delivery for optogenetics by modal demultiplexing in tapered optical fibers. Sci. Rep. 8, 4467 (2018).
Lee, J., Wang, W. & Sabatini, B. L. Anatomically segregated basal ganglia pathways allow parallel behavioral modulation. Nat. Neurosci. 23, 1388–1398 (2020).
Sanders, J. I. & Kepecs, A. A low-cost programmable pulse generator for physiology and behavior. Front. Neuroeng. 7, 43 (2014).
Shamash, P., Carandini, M., Harris, K. & Steinmetz, N. A tool for analyzing electrode tracks from slice histology. Preprint at bioRxiv https://doi.org/10.1101/447995 (2018).
Wang, Q. et al. The Allen Mouse Brain Common Coordinate Framework: a 3D reference atlas. Cell 181, 936–953.e20 (2020).
Claudi, F. et al. Visualizing anatomically registered data with brainrender. eLife 10, e65751 (2021).
Chon, U., Vanselow, D. J., Cheng, K. C. & Kim, Y. Enhanced and unified anatomical labeling for a common mouse brain atlas. Nat. Commun. 10, 5067 (2019).
Claudi, F. et al. BrainGlobe Atlas API: a common interface for neuroanatomical atlases. J. Open Source Softw. 5, 2668 (2020).
Hintiryan, H. et al. The mouse cortico-striatal projectome. Nat. Neurosci. 19, 1100–1114 (2016).
Peters, A. J., Fabre, J. M. J., Steinmetz, N. A., Harris, K. D. & Carandini, M. Striatal activity topographically reflects cortical activity. Nature 591, 420–425 (2021).
Harris, C. R. et al. Array programming with NumPy. Nature 585, 357–362 (2020).
Virtanen, P. et al. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat. Methods 17, 261–272 (2020).
McKinney, W. Data structures for statistical computing in Python. In Proc. 9th Python in Science Conference (eds van der Walt, S. & Millman, J.) 56–61 (SciPy, 2010).
Buitinck, L. et al. API design for machine learning software: experiences from the scikit-learn project. In Proc. ECML PKDD Workshop: Languages for Data Mining and Machine Learning (eds Crémilleux, B. et al.) 108–122 (ECML PKDD, 2013).
Seabold, S. & Perktold, J. Statsmodels: econometric and statistical modeling with Python. In Proc. 9th Python in Science Conference (eds van der Walt, S. & Millman, J.) 92–96 (SciPy, 2010).
Hunter, J. D. Matplotlib: a 2D graphics environment. Comput. Sci. Eng. 9, 90–95 (2007).
Waskom, M. seaborn: statistical data visualization. J. Open Source Softw. 6, 3021 (2021).
Dietterich, T. G. Approximate statistical tests for comparing supervised classification learning algorithms. Neural Comput. 10, 1895–1923 (1998).
Pillow, J. W. et al. Spatio-temporal correlations and visual signalling in a complete neuronal population. Nature 454, 995–999 (2008).
Yuan, M. & Lin, Y. Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Series B Stat. Methodol. 68, 49–67 (2006).
Tseng, S.-Y., Chettih, S. N., Arlt, C., Barroso-Luque, R. & Harvey, C. D. Shared and specialized coding across posterior cortical areas for dynamic navigation decisions. Neuron 110, 2484–2502.e16 (2022).
Churchland, M. M. et al. Stimulus onset quenches neural variability: a widespread cortical phenomenon. Nat. Neurosci. 13, 369–378 (2010).
Eshel, N., Tian, J., Bukwich, M. & Uchida, N. Dopamine neurons share common response function for reward prediction error. Nat. Neurosci. 19, 479–486 (2016).
Rescorla, R. A. & Wagner, A. R. in Classical Conditioning II: Current Research and Theory (eds Black, A. H. & Prokasy, W. F.) 64–99 (Appleton-Century-Crofts, 1972).
Gurney, K. N., Humphries, M. D. & Redgrave, P. A new framework for cortico-striatal plasticity: behavioural theory meets in vitro data at the reinforcement–action interface. PLoS Biol. 13, e1002034 (2015).
Rice, M. E. & Cragg, S. J. Dopamine spillover after quantal release: rethinking dopamine transmission in the nigrostriatal pathway. Brain Res. Rev. 58, 303–313 (2008).
Dreyer, J. K., Herrik, K. F., Berg, R. W. & Hounsgaard, J. D. Influence of phasic and tonic dopamine release on receptor activation. J. Neurosci. 30, 14273–14283 (2010).
Dabney, W., Rowland, M., Bellemare, M. & Munos, R. Distributional reinforcement learning with quantile regression. In Proc. 32nd AAAI Conference on Artificial Intelligence (eds McIlraith, S. A. & Weinberger, K. Q.) 2892–2901 (AAAI Press, 2018).
Huber, P. J. Robust estimation of a location parameter. Ann. Math. Stat. 35, 73–101 (1964).
Romero Pinto, S. & Uchida, N. Tonic dopamine and biases in value learning linked through a biologically inspired reinforcement learning model. Preprint at bioRxiv https://doi.org/10.1101/2023.11.10.566580 (2023).
Lowet, A. S. et al. Data from: an opponent striatal circuit for distributional reinforcement learning. Dryad https://doi.org/10.5061/dryad.80gb5mm0m (2024).
Lowet, A. S. alowet/distributionalRL: Publication-ready version (v1.0.2). Zenodo https://doi.org/10.5281/zenodo.14554845 (2024).
Chandak, Y. et al. Universal off-policy evaluation. In Proc. Advances in Neural Information Processing Systems 34 (eds Ranzato, M. et al.) 27475–27490 (NeurIPS, 2021).
Gagne, C. & Dayan, P. Peril, prudence and planning as risk, avoidance and worry. J. Math. Psychol. 106, 102617 (2022).
Rockafellar, R. T. & Uryasev, S. Optimization of conditional value-at-risk. J. Risk 2, 21–41 (2000).
Fiser, J., Berkes, P., Orbán, G. & Lengyel, M. Statistically optimal perception and learning: from behavior to neural representations. Trends Cogn. Sci. 14, 119–130 (2010).