Animals
Male and female C57BL/6J mice (8 to 12 weeks old; Janvier Labs, France) or DAT-iCre mice on a C57BL/6J background were group-housed under standard conditions (12 h:12 h light:dark, ~22 °C, ~50% humidity) and maintained in triads prior to testing (see Supplementary Methods for details). All experiments and procedures were performed in accordance with European Commission directives 219/1990, 220/1990 and 2010/63, and approved by the ESPCI and the ethical committee no. 059 under APAFIS #34335-2021121318085835.
Lone condition and microsociety experiment
Mice were placed either individually or in triads in a 50 × 50 cm large environment and continuously tracked using the Live Mouse Tracker system36, allowing monitoring of identity and behaviour of individual mice over extended periods. Before the experiment, all mice were implanted with a radio frequency identification (RFID) microchip (Biomark APT-12 PIT Tag, Biomark) under the shoulder skin.
Triads (3 males, 3 females, or 2 females and 1 male) stayed for 8 days and 7 nights in the environment, whereas lone mice were housed for 5 days and 4 nights.
The cage was composed of different zones that were freely accessible by all the mice at any moment. The arena contained a lever on one side and a food dispenser/magazine on the opposite side. Each lever press delivered a 20 mg pellet (TestDiet 5TUL/1811142 purified, Bio-Concept) and the lever becomes inactive for 5 s. A nose poke in the magazine leads to a beam break that is a proxy of the consumption of the pellet by the mouse. Outside the 5 s delay after each lever press, all the mice can press at any moment and consume the pellet at any time following its release. Mice were weighed and health-checked daily. No mouse was removed from the task either for weight loss or for aggression.
Social replacement experiment
To test role stability, we performed a reconfiguration experiment in which a male previously identified as a Scrounger during the first week of group housing was rehoused with two task-naive males (Scrounger–naive–naive condition). Mice were kept in the same semi-natural environment as described above, and behaviour was recorded continuously for an additional 7 days. We retained only triads in which a Scrounger was reliably identified at the end of the first week (13 out of 15 triads). To track individual trajectories across time, the retained Scrounger from week 1 was linked to its corresponding behavioural position in week 2, allowing comparison of lever press counts, distances to archetypes, and profile classification before and after reconfiguration.
Live Mouse Tracker system
Behaviour was monitored continuously (24 h daily, 7 days a week) using a dual acquisition pipeline combining time-stamped, animal-identified operant events (lever, magazine/dispenser TTLs (Transistor-Transistor Logic); MedAssociates) with continuous video tracking to quantify locomotion and social/spatial organization (Live Mouse Tracker)36. Individual lever presses, nose pokes and complete sequences were extracted from the database by matching TTLs to mouse identity within predefined lever/magazine zones, and a complete sequence was defined as a lever press followed by the same mouse reaching the magazine within 6 s, otherwise the sequence was not counted (typically if another mouse was already at the magazine waiting for the food) (see Supplementary Methods for details). Gain was defined as retrieving a pellet within 6 s after a conspecific pressed the lever, whereas loss was defined as a pellet retrieved by a conspecific within 6 s after the subject’s own press.
Tube test
We assessed social hierarchy in a subset of mice using a tube test conducted before and after the microsociety task. Three days before testing, mice were habituated to traverse a 30-cm tube (2.5-cm diameter) connecting two cages; habituation was reached after ≥10 successful crossings. On the following day, tube-test trials were performed by placing two mice at opposite ends of the tube; in triads, each mouse faced the two others once per trial. A win/loss was scored when one mouse retreated such that its hind paws touched the cage floor. Five trials were run before the microsociety task, and five additional trials after 7 days of group housing; the apparatus was cleaned between sessions. For Extended Data Fig. 4a, we quantified each mouse’s rank across the five pre-task trials and compared the final pre-task rank (trial 5) to the post-task rank to visualize rank stability and shifts.
Behavioural analyses
Behavioural data were extracted from the database (PyCharm/MySQL) and analysed in MATLAB and R. Lever press counts (#LP) were computed per day for each mouse. To account for inter-cage variability in overall activity, #LP was normalized within each cage by dividing the #LP per mouse by the total #LP in the cage on that day (%LP). This normalization was applied prior to all archetypal analyses involving social triads. The percentage of complete sequences (%CS) was calculated individually for each mouse, as the number of complete sequences divided by the total #LP on the same day. For Extended Data Fig. 3a,b, polar plots were generated from xy coordinates of representative male (n = 42) and female (n = 42) mice, using a coordinate system centred and aligned so that 0° corresponded to the food dispenser. This standardization allowed us to quantify the time each mouse spent oriented towards the food dispenser during particular events—specifically, in response to a conspecific mouse lever press. The arena was partitioned into four zones (water, food, lever-left area and lever-right area), and mouse orientation was quantified at and 1 s after conspecific lever presses.
Archetypal analysis
Archetypal analyses and visualizations were performed in R using the archetypes package (version 2.2-0.1). Archetypal analysis identifies k idealized behavioural profiles (archetypes) spanning the boundaries of variability in a multivariate dataset and yielding individual α-coefficients and distances to each archetype. The number of archetypes (k) was selected using residual sum of squares as a function of k (elbow criterion) and inspecting solution interpretability in lone and social datasets (see Extended Data Figs. 1d and 2c). For Fig. 1, archetypes were computed in a 2D space (#LP, %CS), yielding two archetypes (Achievers and Storers). In Fig. 2, the three-archetype reference space (Workers, Scroungers and Storers) was built from the full dataset (n = 195) using eight features: %LP and %CS (days 5–7) and food pellets gained from/lost to conspecifics. New cohorts (for example, mixed-sex or dopamine-manipulated triads; Fig. 5) were projected into this fixed space to derive α-coefficients, assign behavioural roles, compute distances to archetypes and compare group compositions. The same archetypal framework was applied to simulated triads (e-triads): archetypes were defined from a reference simulation with separated β values (high-β ‘males’ and low-β ‘females’), enabling robust identification of emergent strategies (Fig. 4g). All other simulated conditions (including Extended Data Fig. 7) were projected into this space. Distances to archetypes derived from α-coefficients were related to dopaminergic signals and firing rates using linear models; expected values at each archetype were estimated from the intercept at zero distance with 95% confidence intervals from prediction errors (Fig. 3). Cage-level compositions were compared to sex-specific null distributions generated by random sampling (10,000 iterations) preserving empirical archetype proportions (Fig. 2i). See also Supplementary Methods for details.
In vivo electrophysiology
Mice were deeply anaesthetized with isoflurane (3% induction, 1–2% maintenance) and extracellular single-unit recordings were performed in the VTA using glass micropipette electrodes (6–9 MΩ, 0.5% NaCl). Signals were amplified and digitized at 25 kHz (spike 2) while sampling the central VTA (anterior–posterior (AP) −3.1 to −4.0 mm, medial–lateral (ML) 0.3–0.7 mm, dorsal–ventral (DV) 4.0–4.8 mm), with electrode tracks spaced by ≥0.1 mm. Spontaneously active dopamine neurons were identified using established electrophysiological criteria (Supplementary Methods). Activity and bursting (% spikes within bursts, %SWB) were quantified in 60-s windows shifted every 15 s (Supplementary Methods).
Stereotaxic surgeries
Stereotaxic surgeries were performed in 6- to 8-week-old DAT-Cre mice under isoflurane anaesthesia. For fibre photometry, AAV1-Syn-FLEX-GCaMP7c was injected unilaterally into the VTA (300 nl, 100 nl min−1; AP −3.20 mm, ML ±0.5 mm, DV −4.20 mm), followed 2 to 3 weeks later by unilateral optic fibre implantation above VTA and fixation with dental acrylic; buprenorphine was given post-operatively. For optogenetics (DAT-Cre males), AAV5-DIO-ChR2-EYFP (or EYFP control) was injected bilaterally in VTA (300 nl per side) and an optic fibre was implanted unilaterally above VTA at a 10° angle (AP −3.20 mm, ML ±0.9 mm, DV −3.95 mm). For chemogenetics (DAT-Cre females), AAV5-DIO-hM4Di-mCherry (or mCherry control) was injected bilaterally in VTA (300 nl per side). Mice recovered in a heated cage and were monitored daily; behavioural testing began at least one week after surgery, and injection or implant sites were systematically verified post hoc by immunohistochemistry (Supplementary Methods).
Immunohistochemistry
After euthanasia, brains were extracted and fixed in 4% paraformaldehyde for at least 3 days at 4 °C, and 60-µm-thick sections were taken through the midbrain on a vibratome. Free-floating sections were blocked (PBS, 3% BSA, 0.2% Triton X-100) and incubated overnight at 4 °C with a mouse anti-tyrosine hydroxylase primary antibody (Sigma T1299, 1:500). Sections were then rinsed and incubated for 3 h at room temperature with a Cy3-conjugated goat anti-mouse secondary antibody (Jackson, 1:500), mounted with ProLong Gold plus DAPI, and imaged on a Zeiss epifluorescence microscope (ZEN); grayscale images were acquired and false-coloured in ImageJ for visualization (Supplementary Methods).
Fibre photometry
DAT-Cre mice injected with AAV1-Syn-FLEX-GCaMP7c and implanted in the VTA underwent fibre photometry experiments in the lone or social context. A Doric Lenses fibre photometry system was employed to record fluorescence signals reflecting dopaminergic neuron activity in the VTA. Fluorescence was excited with a 465-nm LED driven in lock-in mode (220.537 Hz) and routed through a Mini Cube (FMC4_AE(405)_E(460–490)_F(500–550)_S) to the implanted fibre via a patch cord and zirconia sleeve; emitted light was detected with a photoreceiver (AC low setting). The received light signal was converted to electrical signals by a photoreceiver using the AC low setting, before being transmitted through another optic patch cord to the Mini Cube via a dedicated fibre optic adaptor. Signals were acquired in Doric Neuroscience Studio at 12 kHz and low-pass filtered at 12 Hz. For the lone condition, the mice were recorded between one and two hours at the beginning of the dark cycle, when they were active, during the first day and the last two days of the experiment. For the social condition, the mice were also recorded at the beginning of the dark cycle, one after the other, between one and two hours each.
Analysis of fibre photometry recordings
Fluorescence signals were first detrended (biexponential fit) to correct for photobleaching, then re-centred by adding back the pre-detrend mean; ΔF/F was computed relative to a baseline fluorescence signal. Dopamine-related activity was quantified using peri-event time histograms time-locked to behaviourally defined TTL events (lever press, nose poke) with 100-ms bins, converted to event-wise z-scored ΔF/F using the 5-s pre-event baseline, and then smoothed by Gaussian convolution (MATLAB, gausswin; 100-bin window). For each event type, peristimulus time histograms were computed over a −10 s to +10 s window, and response magnitude was quantified as the mean z-scored ΔF/F in a post-event window (0 to +1.5 s) relative to a pre-event window (−10 to −5 s). Statistical significance of event-evoked responses was assessed by paired comparisons of pre-event versus post-event window values across mice, using paired t-tests or Wilcoxon signed-rank tests as appropriate (described in figure legends, Supplementary Table 1 and Supplementary Methods). Focused versus unfocused conspecific lever press events in Scroungers (Fig. 3) were scored visually on the basis of orientation to the lever/dispenser and proximity to the dispenser (within 5 cm; Supplementary Methods).
Optogenetic experiments
In male DAT-Cre mice injected with AAV5-Ef1α-DIO-ChR2(H134R)-EYFP or AAV5-Ef1α-DIO-EYFP, optical stimulations were performed with an ultra-high-power LED (470 nm, Prizmatix) coupled to a patch cord (500 μm core, NA = 0.5, Prizmatix) with an output intensity of 5–10 mW. We applied a 20 Hz optogenetic stimulation protocol (5 ms light pulse) for 15 min, delivered twice: 24 h and 1 h before the start of the microsociety task (in the social environment). No significant changes were observed in the behaviours of the mice after the stimulation.
Chemogenetic experiments
In female DAT-Cre mice injected with AAV5-hSyn-DIO-hM4Di-mCherry or AAV5-hSyn-DIO-mCherry, CNO (water soluble, Hellobio) was administered through a water bottle. The CNO solution was introduced 24 h before the microsociety experiment and remained available during the first day (day 1), and was then replaced by normal water. The concentration of CNO was determined on the basis of a dosage of 5 mg kg−1 assuming a daily water consumption of 5 ml per mouse. A 200 µl solution of CNO at a concentration of 10 mg ml−1 was prepared and administered in 100 ml of water in each bottle.
Modelling
Building a behavioural model of e-mouse behaviours in lone and social conditions
The environment of experiments was modelled as six states (Fig. 4a,e, rooms 1–4, and lever and dispenser positions). The number and sex of agents (e-mice) present in the environment was varied, with; (1) one (male or female) e-mouse in the lone experiment; (2) three male or female e-mice in social experiments; or (3) 1 male and 2 females in the mixed-box experiment. State transition occurred at each time step, with probabilities of transitions determined by a softmax based on Q-values of all accessible states from the current state. In the softmax, the inverse temperature parameter β controlled the exploitation–exploration trade-off, with lower β values producing more stochastic exploration and higher β values promoting exploitation of higher-valued transitions. Q learning occurred after each transition from a departure state to an accessible arriving state. After each transition, the value of the selected move was updated from a prediction error combining the obtained reward and the best expected future value from the arrival state. Updates were scaled by a learning rate (α) and a discount factor (γ) controlled the weight of future outcomes. Furthermore, e-mice encountered satiety, which scaled action probabilities, the learning rate and fatigue, which affected both pressing and eating. Satiety and fatigue were used to scale an action pace in the simulation that was consistent with the experimental measures. The lever could not be pressed for 3 time steps after each press (that is, to mimic the 5 s lever unavailability in experiments). Complete modelling information regarding the full model and observables are given in Supplementary Methods.
Reduced model of social interactions
We built a reduced theoretical model (the ‘reduced model’) to assess, within a mathematically tractable framework, the causal mechanisms whereby specialized behaviours emerge under social interactions. To do so, we derived the reduced model from the reinforcement learning one, based on a continuous time version of Q dynamics—that is, ordinary differential equations (ODEs). In the ODE system, learning and behavioural dynamics operated at a slower time scale than that of individual choices in the full model, so that actions (that is, state transitions, lever pressings and eating) were described probabilistically. In this framework, we performed qualitative analysis to determine the number and stability of fixed points of learning and behavioural (state) variables, as a function of parameters (with a focus on β, essential in setting social interactions in the full model). Moreover, to reduce dimensionality for better tractability, we considered a simpler setup in which the environment contains only two positions for e-mice (the lever L and the food dispenser D) and only two e-mice (which allowed us to assess social interactions), and we did not consider fatigue or satiety. The reduced model ODEs could be expressed under a tractable form (Supplementary Methods). The main results of the qualitative analysis of the system are recapitulated in Supplementary Table 2 (Supplementary Information). Full information on the reduced model derivation and analysis are given in Supplementary Methods.
Statistics
A priori power analyses were not used to predetermine sample sizes. Animals were randomized to groups at the time of viral infection or behavioural testing. Statistical analyses were performed using MATLAB and R. Normality was assessed with the Shapiro–Wilk test; normally distributed data were analysed with independent or paired t-tests, and non-normal data with Mann–Whitney or Wilcoxon signed-rank tests. Repeated-measures ANOVA (one-way or two-way, as appropriate) was used for designs with multiple factors, with Bonferroni–Holm post hoc correction. Chi-square (χ²) tests were used to compare proportions or categorical distributions between groups.
For the archetypal analyses, linear regression models (lm function in R) were used to assess the relationship between behavioural or dopaminergic responses and distance to each archetype. Models included both main effects and archetype × distance interactions, allowing estimation of archetype-specific slopes and intercepts. To compare predicted responses at zero distance (intercepts), marginal means were extracted with the emmeans package and pairwise contrasts performed with Tukey correction. Confidence intervals for predicted values (95% confidence interval, shown in figures) were obtained via standard error propagation (predict(…, se.fit = TRUE)), and model significance was assessed using type II ANOVA (car::Anova) and adjusted R². These procedures were applied identically to behavioural, photometry and neurophysiological datasets.
Unless otherwise specified, all statistical tests were two-tailed. In cases in which specific hypotheses were tested regarding the directionality of the effect—such as expected increases or decreases in dopaminergic firing following ChR2 stimulation or hM4Di inhibition—one-tailed tests were used and explicitly reported.
Data presentation (mean ± s.e.m. or mean ± 95% confidence interval), significance thresholds and the statistical tests used are specified in each figure legend; full statistical details (details test statistics, degrees of freedom, exact P values and multiple-comparison procedures) are provided in Supplementary Table 1.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

