Striatum supports fast learning but not memory recall

All procedures were carried out in accordance with the President and Fellows of Harvard College Institutional Animal Care and Use Committee protocol #IS00000571-6.

Sex of mice

We used male and female mice in an approximately equal ratio (n = 65 males and n = 62 females). We did not observe any differences in the cued reaching behaviour between the sexes. All figures include both males and females.

Housing of mice

Animals were housed on a reverse light cycle in groups (females) or singly housed (males). The room ambient temperature was 75 °F, and the relative humidity was 45%.

Food restriction and habituation to head restraint

We weighed each head-plated and intracranially virally transduced mouse (see below) before beginning food restriction. During food restriction, we limited available chow to reduce the weight of each animal to approximately 85% of the pre-restriction weight of that animal. We switched the daily food from regular animal facility chow to Bio-Serv chocolate-flavoured, nutritionally complete food pellets (item number: F05301). We then began to handle the mice, as follows. On day 1, we habituated the mice to a gloved hand in the home cage and attempted to feed the mice peanut butter from the tip of the gloved finger. On days 2 and 3, we continued to feed the mice peanut butter and habituated mice to handling. On day 4, we began head restraint and fed the mice peanut butter while the head was restrained. On day 5, we fed the mice food pellets while the head was restrained. We presented the food pellets directly to the mouth by loosely attaching each food pellet to a wooden stick, using sticky peanut butter. The mouse could use its tongue and mouth to retrieve the food pellet and consume it. Once the mice comfortably ate food pellets while the head was restrained, we switched the mice to reach training (see the next section ‘Training the forelimb reaching behaviour’).

Training the forelimb reaching behaviour

Forelimb reach training of mice (at least 2 months of age) was accomplished through manual interactions with the food-restricted and head-restrained mice over several days, according to the following stages. In stage 1, we taught each mouse to reach forwards with the right forelimb to touch a wooden stick. As a reward, we provided the mouse with a food pellet that was loosely attached to the stick with peanut butter, bringing the food pellet directly to the mouth of the mouse. In stage 2, we placed the food pellet at the end of the stick and required the mouse to push the food pellet off the stick and into its mouth. For stage 3, we gradually lowered the stick with the pellet until the mice reached forwards to the level of the food pellet presenter mechanism, located below and in front of the nose of the mouse. In stage 4, we removed the stick, requiring the mice to directly pick up the food pellets from the pellet presenter mechanism. During these manual interaction stages, we trained the mice on a behavioural rig that closely resembled the automated rig but with more space for the experimenter to interact with the mouse. We subsequently transitioned the mice to the automated behavioural rig, which included automated mechanisms for presenting pellets and was enclosed in a light-tight box (Extended Data Fig. 1a). On this rig, we trained mice to consistently and successfully pick up pellets in the dark³¹. Once the mice became proficient at reaching, we introduced a food-predicting cue (as described in the next section ‘Training mice to associate a cue with presentation of the food pellet’).

Training mice to associate a cue with the food pellet

All cue training took place in an enclosed, dark, light-proof, sound-insulated behavioural box. Automated mechanisms, controlled by an Arduino, positioned the food pellets directly in front of and below the snout of the mouse (Extended Data Fig. 1a). After the mice became proficient at obtaining food pellets in the dark, we introduced the food-predicting cue. The trial structure was as follows (Extended Data Fig. 1b). The pellet moved into position in front of the mouse over 1.28 s. Following a 0.22-s delay, the cue turned on. The pellet remained stationary in front of the mouse for an additional 8 s before moving out of reach.

The ‘pellet occupancy’ is the likelihood that a pellet will be available in front of the mouse at any given time, unless the mouse has dislodged the pellet by reaching for it. The pellet occupancy was determined by the frequency of pellet loading. During the initial days on the automated rig, we trained the mice with a high pellet occupancy (80%) to provide them with ample practice in reaching for food pellets. Once the motor kinematics of the reaching movements stabilized, we reduced the pellet occupancy to 30%.

To prevent the mice from using the sound of the pellet presenter mechanism as a cue, (1) we continuously played an audio recording of the pellet presenter mechanism in motion, as a masking sound, and (2) the mechanism moved without presenting the pellet 70% of the time. This resulted in a 30% pellet occupancy. The sound of the pellet presenter mechanism was therefore not a reliable food-predicting cue.

To establish the inter-trial interval (ITI), we randomly selected a time interval from a uniform distribution between 0 and 3.5 s, as the first part of the ITI. Then, the automated behavioural rig entered one of two states. In state 1, occurring 30% of the time, the next trial began immediately. In state 2, occurring 70% of the time, the ITI continued for another 9.5–13 s, while the pellet presenter mechanism moved without presenting any pellet. Generally, mice did not reach before the cue (see ‘Behavioural sessions included or excluded’), and mice appeared unable to time the ITI using an internal clock (Extended Data Fig. 4b,e,f).

Catch trials

In a random 10% of trials when the cue turned on, the pellet was omitted. These catch trials were included to test whether the mouse paid attention to the cue or paid attention to the presence of the pellet.

Preventing the mice from cheating

To encourage the mice to focus on the optogenetic cue and prevent them from using sensory systems to detect the presence of the food pellet through other means, we implemented the following strategies:

(1)

We played a continuous, loud sound, which was pre-recorded audio of the pellet presenter mechanism, specifically, the stepper motor, through speakers positioned to the left and right of the mouse. This was done to mask the sound of the stepper motor.
(2)

We placed fresh food pellets out of the reach of the mouse to mask the smell of the pellet that was directly in front of the mouse.
(3)

A CPU fan was positioned to blow air continuously towards the nose of the mouse to prevent olfactory detection of the approaching food pellet.
(4)

In a subset of mice, we trimmed their whiskers to test whether the mice used their whiskers to detect the food pellet. However, this did not have any effect on the cued reaching behaviour. Therefore, we did not trim the whiskers of all mice.
(5)

The behavioural box was enclosed and completely dark to prevent the mouse from seeing the pellet.

We conducted numerous control experiments to determine whether each mouse responded to the optogenetic cue (Extended Data Fig. 4). In cases in which the mouse failed these controls, we excluded the entire behavioural session (see ‘Behavioural sessions included or excluded’).

Video recording of the behaviour

We acquired video of the mice behaving using two infrared cameras (Extended Data Fig. 1a). The first infrared camera acquired the behaviour continuously at 30 frames per second (fps; Supplementary Videos 1–12). This camera sent the video to a DVR that logged the video onto a micro-SD card. The second infrared camera (Flea3 FLIR) acquired the behaviour at a higher frame rate: 255 fps. This high-speed camera acquired chunks of video beginning 1 s before each cue and continuing for 7.5 s after each cue with a gap in video acquisition between trials. This high-speed camera logged the video to a computer running the acquisition software FlyCapture2.

Triangulating the paw position in 3D

To triangulate the paw position in 3D, we placed two mirrors around the mouse: one to the side of the mouse and one below the mouse (Fig. 1b and Extended Data Fig. 1a). These two mirrors gave orthogonal views, one from the side and the other bottom-up, of the paw during the reach (Fig. 1b). The high-speed infrared camera (Flea3 FLIR) was positioned so as to be able to see the paw from a top-down view and also, in the same frame, these two mirrors. We used DeepLabCut⁵¹ to track the 2D position of the paw in each mirror. We then combined data from these orthogonal views to determine the paw position in 3D.

Optogenetic cue

We used an optogenetic activation of corticostriatal neurons in the visual cortex as the food-predicting cue (Extended Data Fig. 2). To activate these corticostriatal neurons, we positioned the output of a fibre-coupled LED just above the thinned skull above the visual cortex of the left hemisphere. We placed a small U-shaped loop of clay around the fibre tip to confine the LED-emitted light to the area just above the skull. The fibre diameter was 1 mm. The fibre emitted 40 mW of blue light (473 nm). We controlled the LED with signals from the Arduino. The duration of the cue was 250 ms (step pulse). In some of the mice, we used the red light-activated opsin ChrimsonR⁵² instead of ChR2 (ref. ⁵³). Stimulation conditions were identical other than the use of 35 mW of 650-nm light for optogenetic activation. We did not observe any differences in the cued reaching between mice with ChrimsonR or ChR2 as the optogenetic activator in the visual cortex (compare Extended Data Fig. 4a with Extended Data Fig. 4d), and hence we combined these two groups of mice, unless otherwise specified.

LED distractor

A distractor LED was positioned a few centimetres above the head of the mouse (Extended Data Fig. 1a). This LED flashed randomly with the same duration as the cue. The distractor LED was the same blue colour as the cue (473 nm). The distractor LED was too far away from the skull to optogenetically activate any neurons in the visual cortex. We controlled the distractor LED by signals from the Arduino. The duration of the distractor was 250 ms (step pulse).

Blocked skull control

To investigate whether the reach is cued by the optogenetic activation of the visual cortex, we performed the following control. In expert mice that reliably reached to the optogenetic cue, we blocked the tip of the optical fibre conveying blue light from the LED to the thinned skull over over visual cortex, centered on primary visual cortex (V1). We inserted a small, thin piece of clay between the tip of the optical fibre and the skull. Blue light was still able to exit the fibre tip, but this blue light did not penetrate the skull. The optogenetic cue-triggered reaches were abolished by this procedure (Extended Data Fig. 4f), indicating that blue light must penetrate the brain to trigger the cued reach.

Synchronizing the video with Arduino events

To synchronize the video of the mouse behaviour with Arduino events, we taped two small infrared LEDs to the front face of each camera. These infrared LEDs emitted light that was invisible to the mouse but detected by the infrared camera. One infrared LED turned on when the cue turned on. The other infrared LED turned on when the distractor LED turned on. Other behavioural events, for example, food pellet presentation, were directly recorded by the camera. Therefore, all relevant behavioural events were acquired along with the mouse behaviour and in the same frames as the mouse behaviour. Moreover, because the distractor LED flashed at random intervals, the pattern of this signal provided a unique sequence during each hour-long training session that enabled the alignment of all systems receiving a copy of the distractor LED signal.

Processing the 30-fps video

To process the 30-fps video, we used custom code written in MATLAB and Python. In brief, the user first drew zones over six regions of the video frame: cue infrared LED, distractor infrared LED, perch zone, reach zone, pellet zone and eat zone (Extended Data Fig. 1c). The first two zones (cue infrared LED and distractor infrared LED) were used to synchronize Arduino events to the video of mouse behaviour (see previous section ‘Synchronizing the video of behaviour with Arduino events’). The perch zone detected movement within the region where the paw rests before the reach. The reach zone detected movement of the paw into the zone between the resting position of the paw and the pellet (Extended Data Fig. 1d). The pellet zone detected the presence of the pellet directly in front of the mouse (Extended Data Fig. 1e). The eat zone detected chewing as an approximately 7-Hz oscillation of the jaw (Extended Data Fig. 1f). Behavioural events were defined by combining behavioural features detected in these various zones. For example, a successful reach was defined as a reach to the pellet, leading to a displacement of the pellet and followed by a long period of chewing (more than several seconds). A drop was defined as a reach to the pellet, leading to a displacement of the pellet and followed by no chewing. A reach that missed the pellet was defined as a reach without dislodging the pellet (this was a rare reach type). A pellet missing reach was defined as a reach, when the pellet was missing. Failed reaches included drops, reaches that missed the pellet and pellet missing reaches. A support vector machine was trained to separate the successes from the drops based on intensity data in the reach, pellet and eat zones. This support vector machine was applied to improve the discrimination of successes and drops. The automated behavioural classification pipeline was 96% accurate at classifying successes, 91% accurate at classifying drops and 98% accurate at classifying misses (Extended Data Fig. 1g).

Measuring the accuracy of the automated pipeline

To measure the accuracy of the automated behavioural classification pipeline, we compared the output of the automated code pipeline to manually classified reaches (Extended Data Fig. 1g).

Processing the 255-fps video

The high-speed video was processed using DeepLabCut⁵¹ to track the paw trajectory in 2D. The 2D positions from two perpendicular mirrors were combined to determine the position of the paw in 3D.

Virus injection

We diluted all AAV to a titre of 10¹³ gc ml⁻¹ or lower. The following viruses were used: pAAV-EF1a-mCherry-IRES-Flpo (Addgene #55634; packaged in AAV2/retro); pAAV-Ef1a-fDIO hChR2(H134R)-eYFP (Addgene #55639; packaged in AAV2/1); AAV2/8-EF1a-fDIO-ChrimsonR-mRuby2-KV2.1TS (modified from Addgene #124603); pAAV-hSyn1-SIO-stGtACR2-FusionRed (Addgene #105677; packaged in AAV2/8); and pAAV-hSyn-dLight1.1 (Addgene #111066; packaged in AAV2/9).

Age of mice for virus injection

We used adult mice older than 40 days of age.

Injection of the AAV carrying retro-Flp into the pDMSt

We injected 300 nl of AAV2/retro-EF1a-mCherry-IRES-Flpo into the pDMSt bilaterally. We targeted the pDMSt at 0.58 mm posterior, 2.5 mm lateral and 2.375 mm ventral of bregma. We lowered the virus-containing pipette (pulled glass pipette) to 0.05 mm below the target site, before retracting the pipette to the target site, waiting 2 min, and then injecting virus at a speed of 30 nl min⁻¹. After the injection, we waited 10 min before withdrawing the pipette from the brain.

Injection of the AAV carrying Flp-dependent channelrhodopsin

We injected 300 nl of AAV2/1-Ef1a-fDIO-ChR2-eYFP into V1 of the left hemisphere. We targeted V1 at 3.8 mm posterior of bregma, 2.5 mm lateral of bregma and 0.65 mm ventral of the pia. After lowering the pipette to the target site, we waited 2 min before injecting. If we detected any leak of the virus out of the cortex, we lowered the pipette another 0.05 mm. We waited 10 min after the injection before withdrawing the pipette from the brain.

Injection of the AAV carrying Flp-dependent ChrimsonR

We injected 300 nl of AAV2/8-EF1a-fDIO-ChrimsonR-mRuby2-KV2.1TS, where TS indicates soma-targeted, into V1 of the left hemisphere. We targeted V1 as described in a previous section (‘Injection of the AAV carrying Flp-dependent channelrhodopsin’).

Surgical virus injection

We prepared all mice for surgery under isoflurane anaesthesia, as previously described^54,55. In brief, after stereotactically flattening the skull, we drilled the hole in the skull, inserted the virus pipette to the target site, injected the virus, retracted the virus pipette and then sutured the skin. Orally administered carprofen or subcutaneous injections of ketoprofen were used as the analgesic. Mice were allowed to recover for at least 3 weeks before we implanted the headframe.

Headframe implant and thinning skull over V1

We used isoflurane anaesthesia during the surgery and maintained the temperature of the animal using a closed-loop, thermoregulating heating pad. We covered the eyes in lubricant, removed the hair from the scalp, cleaned the scalp and cut the skin to expose the skull bilaterally around the midline from behind the lambdoid suture to just anterior of bregma. We stereotactically flattened the skull. We used a bone scraper and scalpel blade to scrape and score the skull. We thinned a 1.5 mm by 1.5 mm square of skull centred on V1 using a bone drill by hand. We put a thin layer of Vetbond onto the skull. We positioned the headframe, a thin bar, behind the lambdoid suture and perpendicular to the midline suture, so that the edges of the headframe protruded laterally just in front of the ears of the mice. We glued the headframe to the skull using Krazy Glue. The Krazy Glue is transparent, allowing light to access the thinned skull over V1. After the glue dried, we built up layers of opaque dental cement over all regions of the skull, except the 1.5 mm by 1.5 mm square centred on V1. We built up dental cement around the edges of this 1.5 mm by 1.5 mm square of thinned skull to create a pocket for the placement of the tip of the LED-coupled optical fibre. We used oral carprofen or subcutaneous ketoprofen as the analgesic. We allowed the animals to recover from the surgery for at least 5 days before beginning behavioural training.

Definition of d′

We defined the discriminability index used to measure behavioural performance (d′) as

$$d^\prime =z(\rmhit)-z(\rmFA)$$

where z(hit) is the Z-score transformation of the hit rate, and z(FA) is the Z-score transformation of the false alarm rate. The hit rate represents the likelihood of observing one or more reaches right after the cue. Graphically, on a curve showing the distribution of the number of reaches in this time window, the hit rate corresponds to the fraction of the area under the curve that lies beyond a certain threshold (one reach in our case). As the hit rate goes up, more and more of the curve is above the threshold and our Z-score increases. We can use the inverse of the cumulative density function to calculate the Z-score associated with the hit rate. Note that scaling the curve or moving its mean, assuming the same transformation is applied to the threshold (one reach), does not change that fraction of the area under the curve. Thus, we can use the inverse of a standard normal cumulative density function to calculate the Z-score from the hit rate. We defined a false alarm as one or more reaches in the time window before the cue. As the hit rate probability goes up, z(hit) increases, and analogously, as the false alarm probability goes up, z(FA) increases. As the false alarm probability goes down and the curves for hits and false alarms become easier to discriminate, z(FA) decreases. Thus, a larger difference in the amount of reaching after the cue relative to before the cue produces a larger $d^\prime $. This is why $d^\prime $ is called the discriminability index. It captures how discriminable two curves are, accounting for both mean and variance. A positive $d^\prime $ indicated more reaches after versus before the cue. To calculate the hit rate, we measured reach rates in the time window 400 ms immediately after cue onset. In Figs. 1 and 3, we used two different time windows before the cue to calculate two false alarm rates. The first false alarm window was 400 ms in duration beginning 400 ms before the cue. The second false alarm window was 400 ms in duration beginning 1 s before the cue. We calculated a $d^\prime $ for each false alarm window, then we used whichever $d^\prime $ was lower. This ensured that we did not miss any preemptive reaching, which should decrease $d^\prime $. In Fig. 2, we used the time window 400 ms in duration beginning 400 ms before the cue to calculate the false alarm rate.

Defining learning stages

We defined beginner as any session with $d^\prime < 0.25$. We defined intermediate as any session with $0.25\le d^\prime < 0.75$. We defined expert as any session with $d^\prime \ge 0.75$.

Behavioural sessions included or excluded

Because video analysis is computationally intensive, we did not analyse data from every session. Instead, we analysed data from every other day for each mouse, except for mice used to plot the learning curves or when otherwise specified. In these cases, daily analysis was performed. We have included data from all analysed sessions in our figures and statistics. However, we excluded all the behavioural data collected by one mouse trainer who set up the behavioural rig improperly (n = 5 mice).

To eliminate the early motor learning stage, when the mouse is still in the process of learning how to grab food pellets (Extended Data Fig. 3), we defined day 1 for the learning curves as the first day when the following two criteria were met: (1) the mouse successfully grabbed and consumed 20 or more pellets during a session lasting 45 min or longer. (2) Pellet occupancy (as described in the previous section ‘Training mice to associate a cue with the food pellet’) was 60% or less. This second criterion ensures that the mouse experiences both successful reach attempts when the pellet is present after the cue and unsuccessful reach attempts when the pellet is absent before the cue.

If we observed any obvious cheating behaviour, that is, preemptive reaching before the cue at a level above the spontaneous baseline, we excluded the entire session from analysis. This rarely occurred; however, in some cases, the mouse appeared able to consistently detect the approaching pellet without using the cue, despite our extensive efforts to mask the presentation of the pellet. If mice could detect the pellet approaching, they always reached before the cue. Mice never patiently waited over the 0.22-s delay between final pellet presentation and the cue onset. Thus, we were able to detect with high certainty any preemptive reaching (that is, cheating) behaviour.

Strategy for suppressing pDMSt neural activity

Direct optogenetic inhibition is limited in its efficiency, if the fraction of cells that express the inhibitory opsin and are exposed to sufficient light power is less than 100%. Rather than use a direct optogenetic inhibition of SPNs, we developed an approach to silence the SPNs. The logic was as follows (Extended Data Fig. 5a). Some inhibitory interneurons have promiscuous connectivity and release the neurotransmitter GABA onto SPNs. We reasoned that it might be possible to express an activating opsin in a subset of inhibitory interneurons with the result of strongly inhibiting a very large fraction of SPNs. We targeted the striatal interneurons using the NKX2.1–Cre transgenic mouse line. Approximately 90% of the striatal interneurons express the transcription factor NKX2.1 during development, and SPNs do not express NKX2.1. However, many other neuron types, outside of the striatum, also express NKX2.1 during development. Therefore, we chose an intersectional approach to target the NKX2.1⁺ cells within the pDMSt specifically. We used Cre recombinase to target the NKX2.1⁺ cells, and we used Flp recombinase to target the pDMSt. First, we crossed the NKX2.1–Cre transgenic mouse line (Jackson Labs stock #008661) with the Cre-On and Flp-On ReaChR transgenic mouse line (R26 LSL FSF ReaChR-mCitrine, Jackson Labs stock #024846), which expresses a red-activatable variant of channelrhodopsin (ReaChR^56,57) only when both recombinases, Cre and Flp, are present. In the double transgenic offspring, the Cre within NKX2.1⁺ cells makes ReaChR expression dependent only on the presence of Flp. Second, we injected Flp recombinase into the pDMSt (see the section ‘Injection of AAV carrying retro-Flp into the pDMSt’). Diffusion limited the spread of Flp around the injection site. As a consequence, all infected neurons in the pDMSt expressed Flp, but only the infected NKX2.1⁺ interneurons also expressed ReaChR (Extended Data Fig. 5b). This led to a high level of ReaChR expression in the striatal interneurons but not in SPNs. Moreover, retro-Flp infected the corticostriatal cue neurons. This enabled the expression of both Flp-dependent ChR2 in corticostriatal projection neurons and Cre-dependent and-Flp-dependent ReaChR in striatal interneurons.

Fibre implant surgery to optically access the pDMSt

To illuminate the pDMSt for optogenetic manipulations or dLight1.1 (ref. ⁵⁸) fibre photometry, we chronically implanted optical fibres over the pDMSt. We prepared the mice for surgery, as described above in the section ‘Headframe implant and thinning skull over V1’. We drilled two craniotomies above the pDMSt bilaterally (or one craniotomy for unilateral dLight fibre photometry). Each optical fibre was 2 mm long, 0.2 mm in diameter and had a 0.39 NA. We obtained these fibres from ThorLabs or Doric Lenses. We implanted each fibre pointing straight down, so that its tip would be situated at approximately 0.58 mm posterior, 2.3 mm lateral and 2.25 mm ventral of bregma. We glued the fibres to the skull using Loctite gel #454 and catalyst. Then, we built up dental cement around each optical fibre to provide more stability. The top of each fibre was coupled to an optical patch cord (0.39 NA), which connected to a laser for optogenetic stimulation or an LED for fibre photometry.

Illuminating the pDMSt for striatal silencing

For mice expressing ReaChR in the striatal interneurons of the pDMSt bilaterally, we coupled each implanted optical fibre (one per hemisphere) to a Y-fibre patch cord (0.39 NA) connected to a Coherent Obis laser producing red light (650 nm). We modulated the power of the laser using transistor-transistor logic (TTL) pulses originating from the Arduino that controlled the behavioural rig. The power emitted from each optical fibre tip was 5 mW. The duration of the red-light step pulse was 1 s. The onset of the red-light pulse preceded the onset of the cue by 5 ms.

Reaching to pDMSt inhibition alone

In mice trained to respond to the optogenetic cue, inhibiting the pDMSt without turning on the cue did not elicit reaching (Extended Data Fig. 5g). These mice did not experience pDMSt inhibition during training. However, when we trained the mice with pDMSt inhibition overlapping the cue during learning (either consistent pDMSt inhibition at every presentation of the cue or randomly interleaved pDMSt inhibition), infrequently (n = 5 mice out of 21 mice), a mouse seemed to learn to respond at a delay to pDMSt inhibition alone (for example, ‘example mouse C’ in Extended Data Fig. 7). To test whether the mouse responded to the cue or pDMSt inhibition alone, in a small fraction of trials, we inhibited the pDMSt without turning on the cue. Only 5 out of the 21 mice exhibited reaching to pDMSt inhibition alone. We did not exclude any mice based on this and included all the mice in the figures. However, we did verify that including or excluding these five mice did not qualitatively change the results (not shown). The reaching to pDMSt inhibition alone was variable day to day. It is possible that this small subset of the mice (n = 5) reached to a post-inhibitory rebound after pDMSt inhibition.

Testing optogenetic strategy for pDMSt silencing

To assess whether ReaChR-mediated activation of NKX2.1⁺ striatal interneurons elicits inhibitory, GABAergic currents onto SPNs, we conducted acute slice electrophysiology in the pDMSt (Extended Data Fig. 5c). We prepared coronal slices containing the pDMSt from adult NKX2.1–Cre crossed to Cre-ON-Flp-ON-ReaChR double transgenic mice that had received AAV retro-Flp virus injections into the pDMSt over 2.5 weeks before. For details on the slicing protocol, refer to ‘In vitro slice electrophysiology’ below. We obtained whole-cell recordings of putative SPNs, which did not express ReaChR–mCitrine (as described in ‘Strategy for suppressing pDMSt neural activity’). The cells were held at 0 mV in voltage-clamp mode to isolate inhibitory currents. We illuminated the slice with red light (6–7 mW from a red-orange laser emitted at 590 nm). Upon illuminating the slice, we observed clear, fast and reliable outwards currents in the SPNs, consistent with light-induced GABAergic synaptic transmission from striatal interneurons (Extended Data Fig. 5c). To confirm the GABAergic nature of these currents, we applied 10 µM gabazine to the slice, which abolished the outwards current (Extended Data Fig. 5c). We recorded from a total of eight cells within the zone of ReaChR–mCitrine expression and two cells located outside of this zone (Extended Data Fig. 5c).

In vitro slice electrophysiology

The experiments closely followed the procedures outlined in previous studies^59,60. Mice were anaesthetized using isoflurane inhalation and subsequently subjected to transcardial perfusion with ice-cold artificial cerebrospinal fluid (ACSF) composed of the following: 125 mM NaCl, 2.5 mM KCl, 25 mM NaHCO₃, 2 mM CaCl₂, 1 mM MgCl₂, 1.25 mM NaH₂PO₄ and 11 mM glucose, resulting in an osmolarity of 300–305 mOsm kg⁻¹. This perfusion was administered at a rate of 12 ml min⁻¹ for a duration of 1–2 min. The brain was removed from the skull, and we prepared 250-µm or 300-μm coronal brain slices in ice-cold ACSF. Slices were then placed in a holding chamber at 34 °C for 10 min, containing a choline-based solution with the following composition: 110 mM choline chloride, 25 mM NaHCO₃, 2.5 mM KCl, 7 mM MgCl₂, 0.5 mM CaCl₂, 1.25 mM NaH₂PO₄, 25 mM glucose, 11.6 mM ascorbic acid and 3.1 mM pyruvic acid. Following this initial incubation, the slices were transferred to a second chamber with ACSF also maintained at 34 °C for a minimum of 30 min. Subsequently, the chamber was shifted to room temperature for the duration of the experiment. During recordings, the temperature was maintained at 32 °C, and carbogen-bubbled ACSF was perfused at a rate of 2–3 ml min⁻¹. For whole-cell recordings, we used pipettes (2.5–3.5 MΩ) crafted from borosilicate glass (Sutter Instruments). Cs-based internal solutions were used for voltage-clamp measurements and contained the following components: 135 mM CsMeSO₃, 10 mM HEPES, 1 mM EGTA, 3.3 mM QX-314 (Cl⁻ salt), 4 mM Mg-ATP, 0.3 mM Na-GTP and 8 mM Na₂-phosphocreatine, with pH adjusted to 7.3 using CsOH, resulting in an osmolarity of 295 mOsm kg⁻¹.

In vivo extracellular electrophysiology acquisition systems

For in vivo electrophysiology, two different electrophysiology systems were used at two different times in the project. First, we used a Plexon Omniplex recording system with a Plexon headstage and Neuronexus probe (A1x32-Edge-10mm-20-177) to record from eight mice. The Neuronexus probe had 32 linearly arranged recording sites, spaced at a distance of 20 µm between each pair of sites. We acquired data at 40 kHz using the Plexon software PlexControl, passed to a DAC card and PC. Second, we used the WHISPER recording system, custom-built at Janelia Research Campus, to record from 19 mice. We used the same 32-channel Neuronexus probe. Data were amplified and multiplexed by the WHISPER acquisition system, and acquired by the National Instruments USB-6366, X series card. We sampled data at a rate of 25 kHz. We used the program SpikeGLX to acquire data.

In vivo extracellular electrophysiology recording configuration

While mice were briefly anaesthetized before the electrophysiology recording, we drilled a craniotomy to allow access to the brain (see ‘Recording from the visual cortex’ or ‘Recording from the pDMSt’). We covered the craniotomy with Kwik-Cast, allowed the animals to wake up and returned the mice to the home cage. At the time of the recording and after the head was restrained, we removed the Kwik-Cast covering the craniotomy. Then we built up a temporary well to contain saline at the site of the craniotomy. We used Kwik-Cast to build up this well after the head had been restrained. We placed sterile 1X PBS (pH 7.4) into this recording well. As the reference ground, we used a silver chloride wire resting in this well and in the saline. Thus, all electrode channels within the brain were referenced to this point outside of the brain. We inserted the probe into the brain. We recorded broadband neural activity while mice performed the behaviour. After the recording session, we computationally high pass-filtered the neural data above 300 Hz to remove low-frequency signals and to obtain the high pass-filtered extracellular activity including action potentials. We periodically replaced the 1X PBS during the recording session, as necessary, to prevent the well and craniotomy from drying out. After the end of the recording session and after removing the electrophysiology probe from the brain, we removed the Kwik-Cast well from the skull of the mouse and covered the hole in the skull with a small amount of fresh Kwik-Cast. We returned the mouse to the home cage.

In vivo acute recordings over days

We recorded acutely from the brain of each mouse over several consecutive days, no more than about 5 days. We then euthanized the mouse, extracted the brain and performed post-mortem histology.

In vivo electrophysiology in visual cortex

To record from the visual cortex in behaving mice, we anaesthetized already trained and already head-framed mice during an additional, brief surgery (5–10 min). We closed the eyes of the mouse during this brief surgery. We drilled a very small hole through the skull over V1. This hole had a diameter of about 0.05 mm. To do this, we first thinned the skull until it cracked, and then we used the bent tip of a needle to flake off bone until the brain was exposed. We covered the exposed brain using a drop of Kwik-Cast applied to the skull. At the time of the recording, we restrained the head of an awake mouse, removed the Kwik-Cast from the skull, built up a Kwik-Cast well around V1 (as described previously in the section ‘In vivo extracellular electrophysiology recording configuration’), added saline to this well, and then placed the electrophysiology probe into the brain, advancing the probe straight down into the brain at a rate of 3 µm s⁻¹ or slower. We targeted V1 at approximately 3.8 mm posterior and 2.5 mm lateral of bregma. We placed the probe in one of two positions: (1) we advanced the probe to the bottom of cortex (depth of about 850 µm), such that the deepest channel on the electrode array was just ventral of cortex, or (2) we advanced the probe until only the most superficial channel of the electrode array was still above the pia. We attempted to avoid any large blood vessels. We registered the depth of each channel according to the estimated bottom of the cortex (position 1) or the estimated top of the cortex (position 2). Although this is not the most accurate way to determine channel depth in the visual cortex, none of our scientific questions depended on exactly accurately registering the channel depths. We recorded extracellular activity while the mice behaved.

In vivo extracellular electrophysiology recording from the pDMSt

To record from the pDMSt in behaving mice, we restrained the head of an already reach-trained mouse. We briefly anaesthetized the mouse by positioning a nose cone, which provided a light level of isoflurane anaesthesia, over the snout of the mouse. We closed the eyes of the mouse and drilled a small hole through the skull. We covered the craniotomy with a small drop of saline (1X PBS, pH 7.4). We built up a well around this craniotomy using Kwik-Cast. We placed the electrophysiology probe and ground wire into this recording well and added more saline. We advanced the electrophysiology probe into the brain at a rate of 5 µm s⁻¹ or slower. We targeted the pDMSt at approximately 0.58 mm posterior, 2 mm lateral and 2.63 mm ventral of bregma. To record from mice with a chronically implanted optical fibre positioned over the pDMSt, we angled the electrode and advanced the electrode through the brain diagonally, until the recording electrode sat beneath the chronically implanted fibre. At the time of an earlier surgery, when we had implanted the headframe onto the skull of the mouse, we had stereotactically flattened the skull and left bregma visible by covering bregma only with Krazy Glue, which is transparent (the rest of the skull was covered with dental cement, except over the visual cortex). Hence, we could use bregma to calibrate the location of entry of the recording electrode. We used an electrode angle of 59° pointed ventral and posterior, with respect to horizontal. We used an electrode angle of 32° pointed lateral, with respect to the midline suture. This electrode track nicely follows the dorsomedial edge of striatum, where the V1 axons terminate. We marked the recording site using dye on the recording probe (see ‘Marking the recording track’). While advancing the probe, we removed the nose cone providing a light level of isoflurane anaesthesia to the mouse and opened their eyes. The mouse recovered from anaesthesia and performed behaviour, as the recording electrode entered the pDMSt. We recorded pDMSt activity while the mouse behaved, for about 1 h. Afterwards, we retracted the recording probe, removed the Kwik-Cast recording well, covered the craniotomy with Kwik-Cast and returned the mouse to its home cage.

Marking the recording track in vivo

When recording from the pDMSt, we marked the recording track for viewing by post-mortem histology. On the last day of recording for each different pDMSt recording site, we coated the recording probe in DiI before inserting the probe into the brain. We quickly removed the PBS from the recording well to prevent the PBS from washing away the DiI. Once the probe had entered the brain but before advancing the probe to its final recording site, we added PBS back to the recording well. We always allowed the DiI-covered recording probe to sit at its final site for at least 15 min. We reconstructed the recording track by viewing DiI in histological sections (see ‘Post-mortem histology’).

Post-mortem histology

To extract the brain, we deeply anaesthetized the mouse using isoflurane. After testing to be sure that the animal did not respond to a toe pinch, the animal was decapitated. We very quickly extracted the brain from the skull and put the brain into 4% paraformaldehyde, where it remained at 4 °C between 36 h and 48 h. We then transferred the brain into 1X PBS (for sectioning using a fixed tissue slicer) or 30% sucrose (for sectioning using a freezing microtome). We made coronal sections that were 50 µm thick. We performed immunohistochemistry in two cases: (1) to locate SPNs (see ‘Immunohistochemistry against DARPP-32’), or (2) to visualize the location of dLight (see ‘Immunohistochemistry to visualize dLight’). Other fluorescent protein signals were not amplified. We mounted the brain sections on slides using a mounting medium containing DAPI. We sliced the entire forebrain starting at the posterior tip of V1 and moving anterior through all of the striatum. We imaged all brain sections and verified virus expression. We used an automated Olympus slide scanner to image the sections (either the VS120 or VS200).

Immunohistochemistry protocol

First, we washed the brain slices in 1X PBS with 0.1% Tween for 90 min. Second, we washed the slices in 10% Blocking One buffer overnight at 4 °C. Third, we added the primary antibody and let the slices sit overnight at 4 °C. Fourth, we washed the slices in 1X PBS with 0.3% Tween (0.3% PBST) three times for 10 min each. Fifth, we incubated the slices in 10% Blocking One with the secondary antibody overnight at 4 °C. Sixth, we washed the slices in 0.3% PBST three times for 10 min each. Last, we washed the slices in 1X PBS for at least 10 min, before mounting the slices.

Immunohistochemistry against DARPP-32

We performed immunohistochemistry against DARPP-32 (Extended Data Fig. 5b) using the Novus Biologicals primary antibody (NB110-56929; concentration 1 µg ml⁻¹) and an anti-rabbit secondary conjugated to Alexa 594 to localize SPNs (A-11012, Invitrogen; concentration 2 µg ml⁻¹).

Selecting new learning days

We defined new learning days as days during learning before the mouse was an expert ($d^\prime < 0.75$), when the d′ calculated for that day was higher than the d′ achieved by that mouse on any previous day. The last 10% of trials in each session were discarded, because mice disengaged from the task during this period.

Measuring the effects of pDMSt inhibition on reach phases

To test whether pDMSt inhibition had any effect on different phases of the reaching behaviour (that is, initial fast ballistic movement of the arm towards the pellet, grasping the pellet, supination of the paw and raising the paw with the pellet to the mouth; Extended Data Fig. 6a–e), we used a combination of DeepLabCut⁵¹ and manual quantification. To measure the trajectory of the initial fast ballistic movement of the arm towards the pellet, we plotted paw trajectories tracked using DeepLabCut⁵¹. To measure the duration of each phase of the reaching behaviour, we viewed the high-speed video and manually counted the number of frames belonging to each phase of the reach. The ∆t from the perch to pellet was the time required for the paw to move from its resting position to touching the pellet. The ∆t grasp was the time required for the fingers of the paw to close completely around the pellet. The ∆t grasp to mouth was the time required for the mouse to lift the pellet into the mouth.

Spike detection and single-unit sorting

We examined the raw physiology signal for periods when the mouse was chewing. Chewing sometimes produced large artefacts in the data that were easily identified. As mice chew at about 7 Hz, the chewing artefacts were periodic at 7 Hz, although these artefacts also contained high-frequency content. The artefacts were much larger than any spikes. We removed any chewing artefacts by subtracting the common mode signal across all physiology channels, because the chewing artefact was identical on all channels. We verified that any spikes detected during these artefacts were identical in shape and size to the spikes detected outside of these artefacts, for a number of example single units when only one large unit was recorded per channel. We filtered the physiology data between 300 Hz and 25 kHz. We then used UltraMegaSort to detect spikes and cluster single units, as described elsewhere⁵⁴^,⁵⁵.

Identifying putative SPNs

We identified putative SPNs as in ref. ³³. First, for each unit, we averaged all of its spikes to get the average waveform. Second, we defined the spike amplitude as the maximum size of the negative deflection. Third, we defined the width of the spike waveform at half-maximum (called ‘width’ in Fig. 5b) as the time delay between the falling and rising time points at half the spike amplitude. Fourth, we measured the average firing rate of the unit over the entire experiment. We used these features to classify the unit as one of the following types (see Fig. 5b for an example session with different unit types).

SPN: width of the spike waveform at half-maximum ≥ 0.22 ms and mean firing rate < 4 Hz
Tonically active neuron: width of the spike waveform at half-maximum ≥ 0.22 ms and mean firing rate ≥ 4 Hz
Fast spiking: width of the spike waveform at half-maximum < 0.22 ms and mean firing rate ≥ 1.25 Hz
Low-firing-rate thin: width of the spike waveform at half-maximum < 0.22 ms and mean firing rate < 1.25 Hz

Defining the probability that a reach was preceded by the cue

We previously used d′ to represent the behaviour. d′ is a commonly used behavioural metric that compares reaching in the time window immediately after the cue (window A) to reaching in the time window before the cue (window B). However, reaches are sparse in this behaviour, and hence many trials are required to calculate a meaningful d′. The hit rate used to calculate d′ was essentially P(reach|cue). An alternative analysis approach is to define the probability that a reach was preceded by the cue, within some time window. We called this the probability P(cue|reach). We plotted P(cue|reach) to understand how the reaching changes within a single day’s training session (Fig. 2f). P(cue|reach) increased within the day’s training session. For the summary datasets across mice, we used the time window within 0.4 s of cue onset, for consistency with the d′ definition in Fig. 1. Thus, P(cue|reach) was the probability that a reach was preceded by the cue within a 0.4-s time window. However, when analysing the example session in (Fig. 2e, top), summarized in (Fig. 2f, top), we expanded the time window after the cue to 1.5 s, allowing us to calculate a meaningful P(cue|reach) for this single session. In contrast to P(cue|reach), the probability that a reach was followed by the cue (within 0.4 s) decreased within a single day’s training session (0.048 ± 0.003 over the first fourth of the session, and 0.038 ± 0.002 over the last half of the session, P = 0.01 from a two-proportion Z-test, n = 58 new learning days from 10 mice).

Control mice for illumination of the pDMSt during learning

To test whether silencing the pDMSt during learning affects behaviour, we trained two groups of mice at the same time (Fig. 3). The first group of mice (n = 9) experienced real silencing of the pDMSt. The second group of mice (n = 7) were controls that did not experience silencing of the pDMSt. These control animals were negative littermates from the NKX2.1–Cre transgenic mouse line cross to the ReaChR transgenic mouse line. To test whether the learning deficit observed in the pDMSt silencing group was simply due to brain damage as a result of virus injections or fibre implants, we performed identical virus injection and fibre implant surgeries on the control mice. The experimenters performing surgeries and training the mice were blinded to the genotype of each mouse from before the first surgery and throughout training. The pDMSt silencing group and control groups were handled identically. We used red light to illuminate the pDMSt bilaterally in the control mice, but this red light did not silence the pDMSt in the absence of ReaChR expression.

Illumination of the pDMSt during learning (loss of one mouse)

One mouse in the pDMSt silencing cohort in Fig. 3 had to be eliminated for health reasons, before switching the cohort to the ‘recovery’ training stage post-pDMSt inhibition.

Identifying sessions where the mouse learned

We identified training sessions in which the mouse improved at cued reaching over the course of the session by evaluating if d′ at the end of the session was more than 0.1 greater than d′ at the beginning of the session. To allow for cases in which the mouse improved either earlier or later in the session, we made three calculations:

$$\beginarrayc\Delta d_1^\prime =d_\rmlast\;75 \% \;\rmo\rmf\;\rms\rme\rms\rms\rmi\rmo\rmn^\prime -\,d_\rmfirst\;25 \% \;\rmo\rmf\;\rms\rme\rms\rms\rmi\rmo\rmn^\prime \\ \Delta d_2^\prime =\,d_\rmlast\;50 \% \;\rmo\rmf\;\rms\rme\rms\rms\rmi\rmo\rmn^\prime -\,d_\rmfirst\;50 \% \;\rmo\rmf\;\rms\rme\rms\rms\rmi\rmo\rmn^\prime \\ \Delta d_3^\prime =\,d_\rmlast\;25\; \% \,\rmo\rmf\;\rms\rme\rms\rms\rmi\rmo\rmn^\prime -\,d_\rmfirst\;75 \% \;\rmo\rmf\;\rms\rme\rms\rms\rmi\rmo\rmn^\prime \endarray$$

If either $\Delta d_1^\prime $, $\Delta d_2^\prime $ or $\Delta d_3^\prime $ were greater than 0.1, we classified the session as one in which the mouse learned.

Injections of muscimol into the superior colliculus

We injected 1.5 µg µl⁻¹ muscimol (M1523, Sigma-Aldrich) dissolved in 0.9% NaCl in ddH₂O (Extended Data Fig. 6g–k). Injections were stereotactically targeted to the superior colliculus at coordinates 4.6 mm posterior to bregma, 0.8 mm lateral to the midline and 1.9 mm deep. To avoid the sinus and a chronically implanted headframe, we used an angled approach (either 18° or 48° from vertical), advancing the pipette laterally to medially and dorsally to ventrally. We used two different injection systems: a Drummond NanoJect for four mice and a WPI injector for the remaining four mice. We briefly anaesthetized the mice, performed a small craniotomy (on the first injection day) and injected muscimol at 30–40 nl min⁻¹. After injection, we waited 2 min before retracting the pipette and waking the mouse. The total anaesthesia duration was less than 15 min. Mice were allowed to recover fully in their home cage for 10–20 min, resuming normal behaviour, before being transferred to the behavioural rig for 1-h-long cued reaching sessions. The mice were then returned to the home cage. We interleaved control days (no injection) with muscimol or saline injection days over several successive days.

Mice were excluded if they were unable to perform spontaneous reaches after the muscimol injection, as cue–reach associative memory could not be assessed. We titrated muscimol volumes to minimize the disruption to spontaneous reaching. The muscimol injection volumes in Extended Data Fig. 6g–k were 115 nl, 100 nl and 100 nl (mouse 1); 50 nl and 50 nl (mouse 2); 90 nl (mouse 3); and 20 nl (mouse 4). The saline injection volumes in Extended Data Fig. 6g–k were 100 nl saline plus dye (mouse 1); 70 nl saline (mouse 2); and 60 nl dye plus saline (mouse 4). We injected DiI as the dye. Three mice failed to recover spontaneous reaching on all muscimol injection days and were excluded. Another mouse was excluded, because it did not perform cued reaching on the control days. We were unable to perform a saline injection in mouse 3 or a dye injection in mouse 2, because the headframes came off, after which the mice were immediately euthanized. We processed all the brains as described in ‘Post-mortem histology’.

For the four animals that did not recover spontaneous reaching after the muscimol injection, we observed gross motor defects, including spinning in the home cage and, on 2 of the muscimol injection days, seizure-like activity manifest as running-like movements of the forelimbs and hindlimbs. (In this latter case, we immediately euthanized the mice.) The spinning behaviour resolved within a few hours, but during this time, the mice did not perform spontaneous reaches when placed into the behavioural rig and hence could not be used to collect data about the cue–reach association.

Statistics on muscimol injections into the superior colliculus

We used a linear mixed-effects model to assess whether muscimol injections affected a behavioural metric (that is, cued reach rate, uncued reach rate or d′; Extended Data Fig. 6k). To account for the non-independence of observations within the same mouse and potential baseline differences between mice, a random intercept was incorporated for each mouse. An overall intercept was also included to capture general trends. The model was

$$\rmm\rme\rmt\rmr\rmi\rmc_ij=\beta _+\beta _1\times \rmC\rmo\rmn\rmd\rmi\rmt\rmi\rmo\rmn_ij+u_i+\epsilon _ij$$

where $\beta _$ is the overall intercept, $\beta _1$ is the fixed-effect coefficient, $\rmCondition_ij$ indicates muscimol or control (including no injection and saline days) for the i-th mouse at the j-th observation, $u_i \sim \mathcalN(0,\sigma _u^2)$ represents the random intercept for the i-th mouse, and $\epsilon _ij \sim \mathcalN(0,\sigma _\epsilon ^2)$ is the residual error, implemented in MATLAB using the fitlme function.

Statistics on learning curves with or without pDMSt inhibition

We used a linear mixed-effects model to assess whether pDMSt inhibition affected the change in d′ on days 15–20 relative to day 1. To account for the non-independence of observations within the same mouse and potential baseline differences between mice, a random intercept was incorporated for each mouse. An overall intercept was also included to capture general trends. The model was

$$\Delta d^\prime _ij=\beta _+\beta _1\times \rmC\rmo\rmn\rmd\rmi\rmt\rmi\rmo\rmn_ij+u_i+\epsilon _ij$$

where $\beta _$ is the overall intercept, $\beta _1$ is the fixed-effect coefficient, $\rmCondition_ij$ indicates the condition of control or pDMSt inhibition during learning for the i-th mouse at the j-th observation, $u_i \sim \mathcalN(0,\sigma _u^2)$ represents the random intercept for the i-th mouse, and $\epsilon _ij \sim \mathcalN(0,\sigma _\epsilon ^2)$ is the residual error, implemented in MATLAB using the fitlme function. When plotting the learning curves, on days excluded from the analysis because the mouse cheated, we interpolated d′ using neighbouring days or filled in the d′ from the last day before cheating.

Natural visual discrimination behaviour

We designed a behavioural paradigm in which mice learned to discriminate between two visual stimuli: a cue, paired with food pellet delivery, and a distractor, unpaired with the pellet (Extended Data Fig. 8). Both stimuli were spatially unstructured and delivered via the same 1-mm-diameter optical fibre coupled to a 473-nm blue LED (maximum output of 40 mW) positioned several inches above the head of the mouse. The LED remained off during baseline periods and was activated only during stimulus presentation. The cue consisted of a gradual ramp in blue-light intensity, increasing from 0 mW to 40 mW over 0.5 s, with pellet delivery coinciding with the ramp onset. The distractor was a 6-Hz flicker, comprising six rapid light ramps (0–40 mW) over 1 s. The cue and distractor were randomly interleaved and presented with approximately equal probabilities.

Natural visual discrimination data analysis

Our analysis of the natural visual discrimination was analogous to our analysis of the optogenetic cue (Extended Data Fig. 8). We measured reach rates within a 400-ms window starting after the onset of either the cue or the distractor. To assess discrimination performance, we calculated the ‘rate ratio’: the ratio of the reach rate following the cue to the reach rate following the distractor. Histograms of the rate ratio were generated across days and across individual mice. We used a linear mixed-effects model to compare the rate ratio on days 10–15 as a function of the condition, that is, whether the animal experienced pDMSt inhibition during learning. To account for the non-independence of observations within the same mouse and potential baseline differences between mice, a random intercept was incorporated for each mouse. An overall intercept was also included to capture general trends. The model was

$$\textRate ratio_ij=\beta _+\beta _1\times \rmC\rmo\rmn\rmd\rmi\rmt\rmi\rmo\rmn_ij+u_i+\epsilon _ij$$

where $\beta _$ is the overall intercept, $\beta _1$ is the fixed-effect coefficient, $\rmCondition_ij$ indicates the condition for the i-th mouse at the j-th observation, $u_i \sim \mathcalN(0,\sigma _u^2)$ represents the random intercept for the i-th mouse, and $\epsilon _ij \sim \mathcalN(0,\sigma _\epsilon ^2)$ is the residual error, implemented in MATLAB using the fitlme function. To compare the rate ratio across mice as a function of the condition, we used the Wilcoxon rank-sum test.

Changes in behaviour from trial to trial

To examine trial-to-trial changes in behaviour that underlie learning, we selected training sessions in which the mouse learned (see ‘Identifying sessions where the mouse learned’). We then considered the individual cue presentations and reach attempts comprising these sessions. To determine how the outcome of one trial affected the next, we considered sequences of three neighbouring trials: trial n − 1, trial n and trial n + 1. This three-trial sequence analysis avoids issues of regression to the mean. We measured how behavioural changes from trial n − 1 to trial n + 1, contingent on the behavioural experience of trial n. We defined behaviour as a 2D quantity, the rate of reaching in the cued window versus the rate of reaching in the uncued window. The cued window was defined as the 400-ms time window immediately after cue onset. The uncued window was defined as the time window beginning 3 s before cue onset and ending 0.25 s before cue onset. To plot how the behaviour changed in this 2D space, we ran a bootstrap by resampling, with replacement, all trial sequences, in which the behaviour of trial n matched a particular type (that is, cued success, cued failure, uncued success or uncued failure; see the next paragraph). If we began with m trials of this particular type, we resampled m trials at each iteration of the bootstrap. For each iteration of the bootstrap, we subtracted the average behaviour on trial n − 1 from the average behaviour on trial n + 1. This is represented by the following: mean(behaviour on trial)_n + 1 (resample i)) − mean(behaviour on trial_n − 1 (resample i)),where i is the set of resampled trials for iteration i of the bootstrap. Thus, this bootstrap analysis represents the change in the joint distribution of cued and uncued reach rates. We plotted 100 runs of the bootstrap as the scatter plots in Fig. 4 (each dot is the result of one iteration of the bootstrap). In the top row of Fig. 4, we also plotted a shaded region that represents the 2D histogram of the change in this joint distribution, after running 1,000 iterations of the bootstrap and filtering the resulting 2D histogram with a Gaussian filter with standard deviation equal to 0.0096 along the x axis (Δreach rate uncued) and 0.024 along the y axis (Δreach rate cued).

We classified the behavioural experience of trial n as one of four types:

(1)

Cued success: on trial n, the mouse
1. (i)
  
  Did not reach before the cue
2. (ii)
  
  Made a successful reach within 1 s after cue onset
(2)

Cued failure: on trial n, the mouse
1. (i)
  
  Did not reach before the cue
2. (ii)
  
  Made a failed reach (that is, dropped pellet, reached but failed to touch the pellet or the pellet was missing at the time of the reach) within 1 s after cue onset
(3)

Uncued success: on trial n, the mouse
1. (i)
  
  Did not reach before the cue
2. (ii)
  
  Did not reach in the 1.5-s time window after the cue
3. (iii)
  
  Made a successful reach between 3.5 s and 7 s after the cue (note that successful reaches are not possible before the cue, when the pellet is missing)
(4)

Uncued failure: on trial n, the mouse
1. (i)
  
  Made a failed reach before the cue
2. (ii)
  
  And was not chewing at the beginning of the trial (we excluded trials when the mouse was chewing at the beginning of the trial, because, if the mouse had its forelimb outstretched to chew, the mouse could potentially detect the approaching pellet with its already outstretched forelimb)
3. (iii)
  
  Or made a failed reach between 3.5 s and 7 s after the cue
4. (iv)
  
  Did not reach in the 1.5-s time window after the cue
5. (v)
  
  Did not make any successful reaches at any time in this trial (that is, all reaches were failures)

To measure the effects of pDMSt optogenetic inhibition, we compared three-trial sequences when the optogenetic inhibition was on or off in trial n (‘inhibition on’ or ‘inhibition off’). To ensure that the inhibition off trials were interleaved with the inhibition on trials, we took inhibition off trials that were followed by an inhibition on trial at the trial position n + 2, n + 3, n + 4 or n + 5. Analogously, to ensure that the inhibition on trials were interleaved with the inhibition off trials, we took inhibition on trials that were followed by an inhibition off trial at the trial position n + 2, n + 3, n + 4 or n + 5.

Note that the time window of pDMSt optogenetic inhibition overlaps the cued success (Fig. 4c, first column) but does not overlap the uncued success (Fig. 4c, third column). This may explain why the pDMSt optogenetic inhibition disrupted the behavioural update after a cued success but not after an uncued success.

No outcome-independent behavioural change

To test whether there was any systematic change in the behaviour that did not depend on the behavioural experience of trial n, we plotted the change in behaviour from trial n − 1 to trial n + 1, given any type of trial n behavioural experience (Fig. 4b). Any type of trial includes trials when the mouse reached successfully, failed or did not reach. There was no systematic change.

Effect of pDMSt inhibition on the current trial

To test whether pDMSt inhibition affects the current trial, we plotted the change in behaviour from trial n − 1 to trial n + 1, given (1) any type of trial n behavioural experience, and (2) pDMSt inhibition during the cue on trial n + 1 versus no inhibition on trial n + 1 (Fig. 4d). pDMSt inhibition on trial n + 1 (beginning 5 ms before the cue and continuing for 1 s) did not produce a shift in behaviour from trial n − 1 to trial n + 1, consistent with data elsewhere in this paper showing no effect of pDMSt inhibition on the ongoing cued reaching behaviour (for example, Fig. 2d).

Varied timing of pDMSt inhibition

To determine when pDMSt neural activity was required for trial-to-trial behavioural updates, we varied the timing of the 0.5-s optogenetic inhibition relative to the cue and reach. Inhibition was applied at one of three time points: (1) starting 0.5 s before cue onset, (2) simultaneously with cue onset, or (3) 0.3 s after cue onset. For each inhibition timing, we analysed sequences of three consecutive trials (trial n − 1, n and n + 1) where the reach on trial n occurred at different times with respect to the pDMSt inhibition. Figure 4e shows the change in reaching behaviour from trial n − 1 to trial n + 1 for successful reaches on trial n. The y axis in Fig. 4e is identical to the y axis in Fig. 4a–d and represents the change in reach rate in a 400-ms window following cue onset. The circles represent the mean across trials, and the vertical lines show the standard error (mean ± s.e.m.). For clarity, the line representing the mean − s.e.m. was omitted. The black dots are when trial n was a control trial; the red dots are when trial n contained pDMSt inhibition. Successful reaches before cue onset led to a decrease in cued reaching on trial n + 1, whereas successful reaches after cue onset increased cued reaching on trial n + 1. To ensure sufficient reach counts for statistical power, we used different reach time windows based on reach frequency. For example, we needed to use a long 1.2-s window for low-frequency reaches before the cue. Hence, in the left panel of Fig. 4e, we used a 1.2-s-long reach time window. Because cued reaches occurred at a higher rate after the cue, we could use a shorter reach time window for the middle panel of Fig. 4e. We used a 0.2-s-long reach time window for the points at x axis positions 0.2 s, 0.3 s, 0.4 s and 0.5 s, but we had to use a longer reach time window of 0.5 s for the point at x axis position 0.75 s owing to lower reach counts. For the right panel of Fig. 4e, we used a 0.2-s-long reach time window for the points at x axis positions 0.2 s, 0.225 s, 0.25 s, 0.275 s, 0.3 s, 0.4 s and 0.5 s. We used a 1-s-long reach time window for the points at x axis positions 1 s, 1.1 s and 1.5 s owing to lower reach counts. Figure 4f displays the difference between the red and black points from Fig. 4e, plotted according to the time difference between the midpoint of pDMSt inhibition (middle of the 0.5-s window) and the midpoint of the reach time bin. We overlaid all the points from the panels in Fig. 4e to construct Fig. 4f.

Control for behaviour change (backwards time control)

If the change in behaviour from trial n − 1 to trial n + 1 depends on the behavioural experience of trial n, then the effect on trial n + 1 should be manifest forwards in time but not backwards in time. If trial n + 1 showed the same shift in behaviour when ‘time moved backwards’, this would suggest a correlational structure in the data but not any causal effect of the behavioural experience of trial n. To test this, instead of conditioning trial n + 1 on trial n, we conditioned trial n + 1 on trial n + 2. We measured the shift in behaviour from trial n − 1 to trial n + 1, that is, before the particular behavioural experience of trial n + 2. This abolished the increase in cued reaching observed after a cued success, and this abolished the increase in uncued reaching observed after an uncued success (Extended Data Fig. 9a).

Optogenetically inhibiting the pDMSt using GtACR2

We used a second, orthogonal optogenetic method to confirm that inhibiting the pDMSt disrupts the behavioural updates from trial to trial. We directly expressed soma-targeted GtACR2, a blue-light-stimulated inhibitory opsin, in SPNs. We injected an AAV carrying Cre-dependent GtACR2 (see ‘Virus injection of GtACR2 into the pDMSt and ChrimsonR into the visual cortex’) into the pDMSt bilaterally in the double transgenic offspring of a cross between the D1–Cre transgenic mouse line and the Adora2a–Cre transgenic mouse line. This led to expression of the inhibitory opsin GtACR2 in both direct and indirect pathway neurons of the pDMSt. We illuminated the pDMSt bilaterally with blue light from a 473-nm laser. The duration of the step-pulse illumination was 1 s and began 5 ms before cue onset. The power of the blue light at the tip of the patch cord was 8 mW. To activate the cue neurons in the visual cortex and avoid any antidromic activation of these visual cortex cue neurons by the blue light in the pDMSt, we expressed soma-targeted ChrimsonR in the cue neurons. ChrimsonR is a red-activatable excitatory opsin. We illuminated the thinned skull over the visual cortex with a red LED coupled to an optical fibre (output power of 35 mW and diameter of the optical fibre of 1 mm). The duration of red light illumination was 0.25 s. We used a constant step pulse of red light to activate the cue neurons. We interleaved the GtACR2-mediated inhibition of the pDMSt on random trials while mice learned to respond to the cue. We aimed to minimize confounds of GtACR2 axonal stimulation by using soma-targeted GtACR2, by using a low light power (8 mW), and by excluding entire sessions if the GtACR2 stimulation led to an increase in cued reaching of more than 20% of the control reach rate (excluded 43 of 87 sessions). We then performed the same trial-to-trial analysis as in Fig. 4. We observed qualitatively the same effects of inhibiting the pDMSt using GtACR2 (Extended Data Fig. 9d) as when we inhibited the pDMSt using ReaChR (Fig. 4).

Viral injections of pDMSt GtACR2 and visual cortex ChrimsonR

We targeted the pDMSt and visual cortex for injections, as described above. We injected 150 nl of the virus AAV2/8-hSyn-SIO-stGtACR2-FusionRed mixed with 150 nl of the virus AAV2/retro-EF1a-mCherry-IRES-Flpo into the pDMSt bilaterally. We injected 300 nl of this mixture into the pDMSt of each hemisphere. We injected 300 nl of the virus AAV2/8-EF1a-fDIO-ChrimsonR-mRuby2-KV2.1TS into V1.

Dopamine fibre photometry in the pDMSt (virus injections)

For the surgery protocol, see the section ‘Virus injections surgical details’. We unilaterally injected the pDMSt with AAV9-syn-dLight1.1. We injected the AAV2/retro-EF1a-mCherry-IRES-Flpo into the pDMSt at the same time. We mixed the Flp and dLight viruses in a ratio of 1:1. We then injected 300 nl of this mixture into the pDMSt. We targeted the pDMSt at 0.58 mm posterior, 2.5 mm lateral and 2.375 mm ventral of bregma. We then injected V1 with AAV2/8-EF1a-fDIO-ChrimsonR-mRuby2-KV2.1TS, as described in ‘Injection of the AAV carrying Flp-dependent ChrimsonR’. We chose to trigger the optogenetic cue using the red-light-activated ChrimsonR instead of the blue-light-activated opsin ChR2, in the case of these mice for dopamine fibre photometry, because we wanted to avoid any leak of blue light into the dLight1.1 excitation channel. We injected adult mice older than 40 days of age.

Dopamine fibre photometry in the pDMSt (optogenetic cue)

We activated the ChrimsonR-expressing neurons in the visual cortex as the optogenetic cue. See the section ‘Red light optogenetic cue’ for details.

Dopamine fibre photometry in the pDMSt (acquisition setup)

We implanted an optical fibre unilaterally over the pDMSt for dopamine fibre photometry. We implanted this fibre over the pDMSt ipsilateral to the virally expressing cue neurons in the visual cortex, because V1 provides a predominantly unilateral projection to the pDMSt (see the section ‘Fibre implants to optically access the pDMSt’ for details about the optical fibre implants and targeting of the pDMSt). We coupled the implanted fibre to a Doric Lenses patch cord (0.37 NA). This was coupled to a Doric Fluorescence MiniCube (iFMC5_E1(460-490)_F1(500-540)_E2(555-570)_F2(580-680)_S) for fluorescence imaging. The excitation LED wavelength was band-passed between 460 nm and 490 nm, and the emission light was band-passed between 500 nm 540 nm for green imaging. The MiniCube also enabled red imaging. For red imaging, the excitation LED wavelength was between 555 nm and 570 nm, and the emission light was band-passed between 580 nm and 680 nm. We used the red channel only as an autofluorescence control. Because the heads of the mice were restrained, motion artefacts and artefacts relating to any bending of the patch cord were limited. We modulated the excitation light emitted by the LED. We modulated this light at a constant frequency of 167 Hz, and we sampled the emission light at 2,000 Hz. We used a LabJack T7 to drive the LED and sample data from the photodetector on the Doric MiniCube. We used a custom code in MATLAB to acquire data from and write data to the LabJack T7.

Dopamine fibre photometry and Z-score

We band-passed the collected green light between 120 Hz and 200 Hz (the excitation light was modulated at 167 Hz). Next, we used the MATLAB package Chronux to get a spectrogram. Chronux uses the multi-taper method to calculate the spectrogram. We passed the following parameters to Chronux: (A) moving window of 0.1 s, shifted every 0.01 s to provide a smoothed output, (B) time-bandwidth product of 3, and (C) 2 tapers. Third, we measured the time-varying power to get a representation of the putative dopamine-dependent fluorescence of dLight1.1. We calculated the Z-score of this power using a rolling baseline window with a duration of 30 s. We median-filtered this Z-score.

Immunohistochemistry to visualize dLight

We followed the protocol described above in the section ‘Immunohistochemistry protocol’. As the primary antibody, we used anti-GFP from Abcam (#ab13970; concentration 2.5 µg ml⁻¹). As the secondary antibody, we used an anti-chicken antibody conjugated to Alexa 488 from Thermo Fisher (A-11039; concentration 10 µg ml⁻¹).

Definition of the post-outcome period

A mouse found out whether a reach was successful at the moment when the paw encountered or failed to encounter the food pellet. If the mouse dropped the pellet, the drop typically occured very shortly (less than 0.1 s) after the paw first encountered the pellet. We aligned reaches to the moment when the arm is outstretched. Hence, the outcome was manifest and known around this time point. Thus, we defined the post-outcome period (POP) as the 5-s time window beginning at the outstretched arm.

Trial type definitions for in vivo physiology analysis

We defined a cued reach as any reach occurring within 3 s of the cue onset. We defined an uncued reach as any reach occurring from 5 s to 16 s after the cue onset, a window that also captures reaches occurring before the onset of the next trial’s cue. As mice learned to respond to the cue, cued reaches became restricted to the brief 400-ms window immediately after the cue, but while mice were learning, there was greater variability in the timing of the apparently cued reach. Therefore, we did not analyse reaches between 3 s and 5 s after cue onset, because they were ambiguously either cued at a long delay or uncued. We defined a success as any reach resulting in successful pellet consumption. We defined a failure as any reach not resulting in successful pellet consumption, including cases when the mouse dropped the pellet, reached in a time window when the pellet was missing or reached without dislodging the pellet.

Training and test sets

We aimed (step 1) to classify neuronal responses into different groups and (step 2) to use these groups to decode the behavioural trial type (that is, cued success, cued failure, uncued success or uncued failure) based on the neural activity (Fig. 5 and Extended Data Fig. 10). To avoid any circular logic or studying noise, we divided the dataset into training and test sets. The training set was a randomly selected 50% of trials acquired for each neuron of each behavioural trial type. For example, if we recorded 50 cued success trials, 40 cued failure trials, 30 uncued success trials and 60 uncued failure trials for neuron 1, then the training set was a random 25 cued success trials, a random 20 cued failure trials, a random 15 uncued success trials and a random 30 uncued failure trials for neuron 1. We used these same trials for all other neurons recorded simultaneously with neuron 1. The test set was the other half of trials. We performed all of step 1 (classification of neurons into different groups) based on the training set only (Extended Data Fig. 10). We then performed all of step 2 (decoding the behaviour based on the neural activity) based on the test set only (Fig. 5j–l). Hence, any patterns detected by the grouping in step 1 are only useful in step 2, if these patterns are consistent across the training and test sets and do not represent noise.

Two approaches to analyse the SPN activity patterns

We observed that some neurons were more active after a success than after a failure, whereas other neurons were more active after a failure than after a success. To investigate this observation more rigorously, we took two different approaches to organizing the neural activity patterns of the recorded SPNs. Approach 1 was fitting a GLM to the activity pattern of each neuron, followed by clustering of the GLM coefficients (Extended Data Fig. 10a–e). Approach 2 was performing a tensor regression to relate a tensor (or matrix) representing the activity patterns of the neurons to the different behavioural conditions (Extended Data Fig. 10f–l). Both approaches ultimately provided a similar view of the neural data, that is, one group of cells was more active after a success, and a second, different group of cells was more active after a failure, consistent with our observation by eye. We explain each of these two approaches in greater detail below. We used only trials in the training set for the GLM fitting and tensor regression (see ‘Training and test sets’).

Generalized linear model

We built a GLM to analyse how behavioural events predict the neural activity of each recorded neuron. The behavioural events were:

(1)

Cue
(2)

Distractor LED
(3)

Reach (moment of arm outstretched)
(4)

Successful outcome (moment of arm outstretched)
(5)

Failed outcome is dropped pellet (moment of arm outstretched)
(6)

Failed outcome is pellet missing (moment of arm outstretched)
(7)

Cued successful outcome (moment of arm outstretched)
(8)

Cued failed outcome is dropped pellet (moment of arm outstretched)
(9)

Cued failed outcome is pellet missing (moment of arm outstretched)

We binned the neural activity into 0.1-s time bins, and we represented each behavioural event as 1’s or 0’s across the 0.1-s time bins. We shifted each of the nine behavioural events in time steps of 0.1 s to produce more time-shifted behavioural events (from 2 s before the event to 5 s after the event, 9 × 71 = 639 time-shifted behavioural events). We then used a custom code in Python wrapping scikit-learn to find a weight or GLM coefficient (Extended Data Fig. 10a) associated with each of these time-shifted behavioural events. We used a linear link function between the time-shifted behavioural events and the neural activity. To fit the GLM, we used fivefold cross-validation and held out 10% of the data for testing. The resulting GLM coefficients attempted to relate the time-shifted behavioural events to the neural activity. The coefficients associated with each type of behavioural event provide a picture of how that behavioural event predicts neural activity in time. To get the coefficients for a failed outcome, we averaged the coefficients for the two types of failures, (A) dropped pellet and (B) reach to a missing pellet.

Our goal is to find a GLM that is a good fit to the data. We used regularization to prevent overfitting. Regularization adds a penalty that is a function of the magnitude of the GLM coefficients. Hence, with regularization, more parsimonious solutions are preferred. There are different approaches to regularization. We performed a hyperparameter sweep over various regularization parameters to find the regularization parameters resulting in a GLM with the highest R² regression score function (coefficient of determination):

$$R^2=1-\,\frac\rmSS_\rmres\rmSS_\rmtot$$

where SS_res is the sum of squares of residuals after subtracting the model fit, and SS_tot is the total sum of squares (proportional to the variance of the data). These two regularization parameters were used: α and l1_ratio. At α = 0, this is ordinary least squares, and there is no regularization of the model. At α ≠ 0 and l1_ratio = 0, this is Ridge regression. At α ≠ 0 and l1_ratio = 1, this is Lasso regression. Otherwise, we used ElasticNet (see scikit-learn documentation). We tested α = 0 and all combinations of values for the regularization parameters: α = [0.01,0.1,1] and l1_ratio = [0,0.1,0.5,0.9,1]. We performed this hyperparameter sweep and fit the GLM separately for each neuron. All code is freely available on GitHub (https://github.com/kimerein/k-glm).

Clustering the GLM coefficients in the POP

To study whether there is neural activity in the striatum that represents both the reach outcome and its context, we considered the GLM coefficients assigned to the POP (Extended Data Fig. 10b). The POP is the time period after the arm is outstretched (see ‘Definition of the post-outcome period’) and continuing for 5 s. We took the GLM coefficients from 0 s to 5 s for each of these four behavioural events:

(1)

Successful outcome (success)
(2)

Failed outcome (failure)
(3)

Cued successful outcome (cue × success)
(4)

Cued failed outcome (cue × failure)

We called the POP coefficients for each of these behavioural events a ‘kernel’. We smoothed the kernels with a 0.08-s time bin, then max-normalized the kernels. Note that there are now four kernels per neuron. We concatenated the four kernels to make a data vector for each neuron. We considered only neurons with POP coefficients greater than zero. (The excluded neurons had GLM coefficient assignments related to other behavioural events, for example, the cue, or GLM coefficient assignments before the POP period but no GLM coefficient greater than zero in the POP period for the four behavioural events listed above.) Finally, we performed k-means clustering of these vectors to partition them into two clusters (see t-distributed stochastic neighbour embedding (t-SNE) with labels ‘Clust 1’ and ‘Clust 2’ in Extended Data Fig. 10d). For visualization purposes only, we plotted these two clusters in a low-dimensional space using t-SNE in MATLAB (t-SNE parameters: Euclidean distance, perplexity = 150; Extended Data Fig. 10d).

Setting up the tensor regression

We used only the training set to train the regression and later validated using the test set. The goal of the tensor regression (Extended Data Fig. 10f–l) was to predict the behavioural condition (that is, cued success, cued failure, uncued success or uncued failure) from the neural activity. The model can be considered a multilinear (3D) reduced-rank multinomial regression. We attempted to predict the current behavioural condition from the neural activity of the 1,000 SPNs. Typically, there is not a unique solution to this problem, so model comparison was used to choose a rank for the model. Backpropagation via the ADAM optimizer was used to optimize the coefficient weights.

Furthermore, we aimed to find interpretable patterns in the data. Hence, we searched for a regression that could be decomposed into a low-rank sum of rank−1 outer products (that is, a Kruskal tensor). Thus, we searched for a low-dimensional representation that captures the major features of the relationship between the behavioural condition and the neural data. The low dimensionality of this representation or decomposition simplifies our interpretation of the regression and improves the interpretability of the solution found by the optimization algorithm.

We set up the regression as follows. For simplicity, we trial-averaged the responses of each neuron within each of the four behavioural conditions (Extended Data Fig. 10f):

(1)

Successful outcome (success)
(2)

Failed outcome (failure)
(3)

Cued successful outcome (cued success)
(4)

Cued failed outcome (cued failure)

We then time-shifted the failure responses (2 and 4 above) to align the timing of the dopamine dip after a failure (approximately 1.6 s after the arm outstretched) to the timing of the dopamine peak after a success (approximately 0.83 s after the arm outstretched). Although dopamine was measured in a separate group of mice by dLight fibre photometry, we observed that the timing of the post-success dopamine peak and post-failure dopamine dip were quite consistent across mice (not shown). Therefore, we chose the timing of the peak or dip from the averaged data across mice and used those time points to shift the neural data before the tensor regression.

We did not have a trial dimension, because we trial-averaged. For each behavioural condition, there were N neurons by T time points. Putting together the four behavioural conditions, we ended up with a 3D matrix with dimensions, N neurons by T time points by C behavioural conditions (Extended Data Fig. 10g). This 3D matrix, or tensor, was the input to the regression.

We performed a multinomial logistic regression, because we are trying to predict a categorical variable, not a numeric variable, in this case. The categorical variable is the behavioural condition (that is, cued success, cued failure, uncued success or uncued failure). We used custom code wrapping PyTorch in Python to regress the behavioural condition on the input matrix. The output of the model is in the form of a Kruskal tensor, that is, a set of components, where each comprised three 1D vectors, or factors: an N-dimensional, T-dimensional and C-dimensional vector. Taking the outer product of each set of vectors and summing the resulting 3D arrays makes a rank-R beta weight tensor. The inner product of the input tensor with this beta weight tensor produces the output logits for the multinomial regression model. Vectors in the Kruskal tensors can be thought of as the weights, or loadings. By considering these vectors, we can observe the loadings onto each modality (that is, neurons (Extended Data Fig. 10j, left), time points (Extended Data Fig. 10j, middle) and behavioural conditions (Extended Data Fig. 10j, right)). We enforced a non-negativity constraint on the optimized Kruskal tensor weights corresponding to the neuron vectors (that is, factors) only. The other two vectors (that is, factors for time points and behavioural conditions) were allowed to be positive, negative or zero valued. The final tensor regression model was selected to be of rank 2 and thus produced 2 components (see ‘Selecting the rank of the tensor regression’). One component was associated with a specific pattern of activity after a success versus failure. The second component was associated with a different pattern of activity after a success versus failure. These two components tended not to share neurons (Extended Data Fig. 10j, left), suggesting that they represented two different groups of cells. All code is freely available on GitHub (https://github.com/kimerein/tensor_regression).

Tensor regression optimization

We randomly initialized the N neurons by T time points by C behavioural conditions tensor, which represents the regression (see ‘Setting up the tensor regression’), by sampling the parameters from the uniform distribution between 0 and 1, scaled by a constant. This constant is a hyperparameter called Bcp_init_scale in the code (see https://github.com/kimerein/tensor_regression). We set Bcp_init_scale to 0.625. We then optimized the tensor, using a learning rate of 0.007 and minimizing the cross-entropy loss using the ADAM optimizer (see torch.nn.CrossEntropyLoss and torch.optim.Adam), until convergence.

Tensor regression regularization

We used Ridge (L2) regularization, which adds a penalty proportional to the squared magnitude of the parameters. This penalty is added to the loss function, which the optimization attempts to minimize (see ‘Tensor regression optimization’).

Selecting the rank of the tensor regression

Before running the optimization, we must manually select the rank, or number of components, of the tensor regression (Extended Data Fig. 10h,i). The rank can be thought of as roughly analogous to the number of components in principal components analysis or reduced-rank regression. To choose the rank, we re-ran the tensor regression optimization many times, obtaining a solution with a different rank each time. We re-ran the tensor regression optimization ten times for each of the following ranks: 1, 2, 3, 4 and 5. We present the results in Extended Data Fig. 10h,i. First, we found that the loss (we used the cross-entropy loss; see torch.nn.CrossEntropyLoss) was not much worse when the solution was a two-rank solution versus a three-rank, four-rank or five-rank solution (Extended Data Fig. 10i). Therefore, we chose to present a two-rank solution (Extended Data Fig. 10i, arrows), which is simpler to interpret.

Choosing a specific tensor regression solution

Next, we considered the ten different, two-rank solutions produced by running the tensor regression optimization ten times. We noticed that one solution loaded the two components onto two different and largely non-overlapping groups of neurons. We measured the overlap as the ‘joint loading penalty’, J, defined as the pairwise sum of factor loadings onto the same neuron over the pairwise difference of factor loadings onto the same neuron (Extended Data Fig. 10j), that is,

$$J=\,\fracw_n,i-w_n,j$$

where n is a neuron in the set of neurons N; $i,j$ are pairs of factors in the set of factors F given $i\ne j$; and $w_n,i$ is the loading (or weight) of factor i onto neuron n for the N-dimensional ‘neuron’ vector component of the Kruskal tensor. Note that $w_n,i$ is always positive, as described above (‘Setting up the tensor regression’). Hence, as the response of a neuron is described more unevenly by the different factors belonging to different components, the penalty J decreases. We chose the solution to the tensor regression optimization that minimized J. This was a solution that loaded the neuron factors of two components onto two largely separate and non-overlapping groups of neurons (Extended Data Fig. 10h, arrow). Note that this solution also utilized the two components equally overall, as measured by the ‘component weight’, that is, the sum of the absolute value of all mean-subtracted parameter weights (Extended Data Fig. 10i, arrows).

Validation of tensor regression

The tensor regression describes the relationship between the neural data and the behaviour based on the training set. To validate our tensor regression, we asked whether this solution is useful to describe the relationship between the neural data and the behaviour for the test set. The test set contains a set of trials independent from the training set. We used the tensor regression to predict behavioural successes versus failures from the neural activity of the test set. The regression correctly predicted behavioural successes versus failures for the test set (Extended Data Fig. 10k), suggesting that there is something detected by the regression that is consistent across the training and test sets. We shuffled the neuron ID, and this markedly degraded the prediction. We shuffled the time points, and this dramatically degraded the prediction. Shuffling both neuron ID and time points further degraded the prediction (Extended Data Fig. 10l).

The simpler approach to the neuron groups 1 and 2

Both approaches (approach 1: clustering GLM coefficients, and approach 2: tensor regression) produced two groups of neurons, which have different response properties. We analysed these two groups of neurons, populating all parts of Fig. 5, based on each approach, and we found that either approach (approach 1: clustering GLM coefficients, or approach 2: tensor regression) produced qualitatively similar results (not shown). However, we decided to use a simpler approach (Extended Data Fig. 10m,n) to separate the neurons into two groups for our presentation in Fig. 5. All approaches revealed consistent structure in the data that was able to predict the behaviour from the neural activity. We arrived at this simpler approach as follows. We observed that component 1 from the tensor regression indicated higher activity that tends to decrease after a success (Extended Data Fig. 10j). We captured this pattern using the ‘modulation index’ after a success (Extended Data Fig. 10m). The modulation index, m, was defined as

$$m=\fracc_\rm\text2 to 5s-c_\rm\text0 to 2s $$

where $c_\rm\text2 to 5s$ is the average GLM coefficient from 2 s to 5 s after the arm is outstretched, and $c_\rm\text0 to 2s$ is the average GLM coefficient from 0 s to 2 s after the arm is outstretched. For a success, we calculated $m_\rmsuccess$ for the success GLM coefficients and $m_\rm\textcued success$ for the cued success GLM coefficients. We averaged $m_\rmsuccess$ and $m_\rmcued\; success$ to get the modulation index after a success, presented in Extended Data Fig. 10m,n. We also observed that component 2 from the tensor regression indicated slightly increasing and sustained activity after a failure (Extended Data Fig. 10j). We captured a pattern of sustained modulation after a failure using the ‘sustained metric’ (Extended Data Fig. 10m). The sustained metric, s, was defined as

$$s=| c_\rm\text1 to 5s| $$

where $c_\rm\text1 to 5s$ is the average GLM coefficient from 1 s to 5 s after the arm is outstretched. We calculated $s_\rmfailure$ for the failure GLM coefficients and $s_\textcued failure$ for the cued failure GLM coefficients. We averaged $s_\rmfailure$ and $s_\textcued failure$ to get the sustained metric after a failure, presented in Extended Data Fig. 10m,n. The k-means clustering of GLM coefficients produced a division that qualitatively matched these observations (see purple versus cyan dots representing neurons in Extended Data Fig. 10n, top). For simplicity, we decided to just draw a line that separated the purple neurons from the blue neurons in Extended Data Fig. 10n, bottom. We used this line to divide neurons for the analysis presented in Fig. 5. Both of the more complicated approaches (that is, clustering GLM coefficients and tensor regression) motivated our decision to use this line (and not some other boundary) to separate the neurons in Fig. 5 into two groups. Only the data in the training set was used to draw the separation boundary in Extended Data Fig. 10n, bottom, whereas conclusions about its utility were drawn from its application to the test set.

Decoding the behaviour from average unit firing rates

We used only the test set to attempt to decode trial identities (Fig. 5k). To determine whether the neural activity of SPNs in the POP encodes the four behavioural conditions, that is, cued success, cued failure, uncued success or uncued failure, we measured, in each of these behavioural conditions separately, the trial-averaged firing rate of each SPN over the time window 1–5 s after the outstretched arm. We excluded the 1-s window immediately after the outstretched arm to ensure that the cue offset precedes the analysed time window by more than 0.75 s (Fig. 5i). We were not interested in the immediate cue-evoked response but rather whether the cue information continues to be represented after the outcome is known. We considered neurons belonging to either group 1 or group 2, as classified by the methods described above using only the training set for the classification. We ran a bootstrap with 100 iterations to plot how group 1 versus group 2 neuronal firing rates represent the four behavioural conditions (Fig. 5k). At each iteration of the bootstrap, from the group 1 neurons, we randomly sub-sampled n neurons with replacement, and from the group 2 neurons, we randomly sub-sampled n neurons with replacement. We then averaged the firing rates of all group 1 neurons and plotted this as the value along the y axis in Fig. 5k. We averaged the firing rates of all group 2 neurons and plotted this as the value along the x axis in Fig. 5k. There were four behavioural conditions for each sub-sampled set of n neurons. Hence, the 400 points in Fig. 5k represent the average firing rates of group 1 versus group 2 neurons, for each of the behavioural conditions. We found that this mapping, at least partially, separated the cued successes from uncued successes, and both success types from failures. To quantify the quality of this separation, we used linear discriminant analysis (LDA) to attempt a three-way separation of behavioural conditions (cued success versus uncued success versus failure) based on the points in Fig. 5k. We measured the accuracy of the LDA prediction. Higher prediction accuracies indicated better separation. We reported the accuracy of the LDA prediction for different numbers of neurons sub-sampled, n (Fig. 5k, bottom-right).

Shuffled average unit firing rates

To determine whether the separation of neurons into groups 1 and 2 provides any meaningful information, we took all neurons identified as belonging to group 1 or group 2, then shuffled the identities of these neurons before attempting the decoding of the behavioural condition from the neural activity. Figure 5k, top right, shows what happens as a result of this shuffling. Note that successes, and, in particular, the uncued success, are no longer separable from failures. The shuffle decreased the separation of the four behavioural conditions and the quality of the decoding. This indicates that the assignment of neurons into groups 1 or 2 provides added information that helps to decode the current behavioural condition. However, note that some information remains in the activity of all the neurons combined (along the diagonal y = x in Fig. 5k, top right). We also performed a second type of shuffle. For this second shuffle, we maintained the unit identities but shuffled the average firing rates with respect to the behavioural conditions. For example, if neuron 1 had average firing rates of 0.5, 2, 4 and 0 spikes per second for the four behavioural conditions of cued success, cued failure, uncued success and uncued failure, respectively, then after shuffling, neuron 1 had average firing rates of 4, 0, 0.5 and 2 spikes per second for the four behavioural conditions of cued success, cued failure, uncued success and uncued failure, respectively. As expected, this second shuffle also disrupted the decoding of the current behavioural condition (Fig. 5k, bottom-right).

Decoding the behaviour from single-trial firing rates

We used only the test set to attempt to decode trial identities (Fig. 5l). As described above, the average firing rates of the neurons could be used to decode the behavioural condition (that is, cued success, cued failure, uncued success and uncued failure). To test whether single-trial firing rates provided sufficient information to perform similar decoding, we measured the firing rate of each neuron on each trial averaged over the time window 1–5 s after the outstretched arm. We ran a bootstrap with 100 iterations. We randomly sub-sampled n neurons with replacement from the group 1 neurons, and we randomly sub-sampled n neurons with replacement from the group 2 neurons. Then, we randomly sampled one single trial from each unit, for each behavioural condition. For each behavioural condition, we averaged the n single trials. We plotted the average activity from neurons belonging to group 1 on the y axis (Fig. 5l), and we plotted the average activity from neurons belonging to group 2 on the x axis (Fig. 5l). Therefore, there are 100 points plotted (100 bootstrap iterations) for each behavioural condition. We used LDA to attempt a three-way separation of these points based on the behavioural condition (cued success versus uncued success versus failure). We plotted the accuracy of the LDA prediction of the behavioural condition, as a function of the number of trials sub-sampled (Fig. 5l, bottom-right).

Shuffled single-trial firing rates

First, we shuffled the identities of the group 1 and group 2 neurons, before attempting to decode the behavioural condition from neural activity (Fig. 5l, top right). This disrupted the decoding. Second, we randomly permuted the time window-averaged firing rates of single trials with respect to the behavioural conditions of those single trials (Fig. 5l, bottom right). This shuffle also disrupted the decoding.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.