Patient recruitment
Experiments were conducted according to protocol guidelines approved by the Institutional Review Board for Baylor College of Medicine and Affiliated Hospitals (H-50885 for the Neuropixels recordings and H-18112 for the EMU recordings). All of the recruited patients for the Neuropixels recordings were diagnosed with drug-resistant temporal lobe epilepsy and were scheduled to undergo an anteromesial temporal lobectomy for seizure control. All of the patients provided written informed consent to participate in the study and were aware that participation was voluntary and would not affect their clinical course. Included patients’ age ranged from 25–54 years old (average, 39.6 ± 11.8), with three female and four male patients. Four resections were on the left side, and three were on the right. In one individual (p3), recordings were performed in the middle temporal lobe before resection. None of the patients reported explicit memory of intraoperative events after the case when discussed in the post-operative care unit or while recovering in the hospital the next day.
Note that we include for comparison purposes a cohort of awake patients listening to podcast stimuli. These patients were recruited from patients undergoing invasive recordings in the EMU at Baylor St Luke’s Hospital. Details on methods for this group of patients were reported previously21,34,52,53,54.
Neuropixels data acquisition set-up and intraoperative recordings
Neuropixels 1.0-S probes (IMEC) with 384 recording channels (total recording contacts = 960, usable recording contacts = 384) were used for recordings (dimensions: 70 μm width, 100 μm thickness, 10 mm length). The Neuropixels probe, consisting of both the recording shank and the headstage, were individually sterilized with ethylene oxide (Bioseal)6. Our intraoperative data acquisition system included a custom-built rig including a PXI chassis affixed with an IMEC/Neuropixels PXIe Acquisition module (PXIe-1071) and National Instruments DAQ (PXI-6133) for acquiring neuronal signals and any other task-relevant analogue/digital signals respectively. Our recording rig was certified by the Biomedical Engineering at Baylor St Luke’s Medical Center, where the intraoperative recording experiments were conducted. A high-performance computer (10-core processor) was used for neural data acquisition using open-source software such as SpikeGLX 3.0 and OpenEphys v.0.6x for data acquisition (the action potential (AP) band was band-pass filtered from 0.3 kHz to 10 kHz and acquired at 30 kHz sampling rate; the LFP band was band-pass filtered from 0.5 Hz to 500 Hz and acquired at a 2,500 Hz sampling rate). We used a short-map probe channel configuration for recording, selecting the 384 contacts located along the bottom third of the recording shank.
Audio was played through a separate computer using pregenerated .wav files and captured at 30 kHz or 1,000 kHz on the NIDAQ through a coaxial cable splitter that sent the same signal to speakers adjacent to the patient. MATLAB (MathWorks) in conjunction with a LabJack (LabJack U6) was used to generate a continuous TTL pulse of which the width was modulated by the current timestamp and recorded on both the neural and audio datafiles. Online synchronization of the AP and LFP files was performed by the OpenEphys recording software. Offline synchronization of the neural and audio data was performed by calculating a scale and offset factor via a linear regression between the time stamps of the reconstructed TTL pulses and confirmed with visual inspection of the aligned traces.
Acute intraoperative recordings were conducted in brain tissue designated for resection based on purely clinical considerations. The probe was positioned using a ROSA ONE Brain (Zimmer Biomet) robotic arm and lowered into the brain 5–6 mm from the ependymal surface using an AlphaOmega (Alpha Omega Engineering). The penetration was monitored through online visualization of the neuronal data and through direct visualization with the operating microscope (Kinevo 900). Reference and ground signals on the Neuropixels probe were acquired by connecting to sterile needles placed in the scalp (separate needles inserted at distinct scalp locations for ground and reference respectively).
For all patients (n = 7), we conducted neuronal recordings under general anaesthesia for at most 30 min as per the experimental protocol. All of the patients were under total intravenous anaesthesia, with propofol as the main anaesthetic for each experimental protocol (Extended Data Table 1). Inhaled anaesthetics were only used for induction and stopped at least 1 h before recordings. The anaesthesiologist titrated the anaesthetic drug infusion rates so that the BIS monitor (Medtronic) value was between 45 and 60 for the duration of the surgical case55. Notably, BIS values range between 0 (completely comatose) and 100 (fully awake), with standard intraoperative values between 40 and 60. Specific anaesthesia depth was flat across the brief time of the experiment. First, recordings took place several hours after anaesthesia induction and several hours before the end of the procedure, so patients were well into the stable portion of the surgery. Second, the anaesthesiologist was maintaining active monitoring and stably controlled anaesthesia levels.
For patients p4, p5 and p6, we recorded neuronal activity during passive auditory stimuli presentation. For p4, a sequence of auditory stimuli (pure tones; f1 = 1 kHz, f2 = 3 kHz) was presented with an 80–20 probability distribution, with the less frequent auditory stimulus serving as an auditory oddball stimulus (n = 300 trials). For p5 and p6 we counterbalanced the tones. A sequence of auditory stimuli (pure tones; f1 = 200 Hz, f2 = 5 kHz) were presented with an 80–20 probability distribution, while switching the tone frequency designated as the auditory oddball stimulus halfway through (first half, n = 150 trials, f2 is auditory oddball; second half, n = 150 trials, f1 is auditory oddball). We interleaved a washout period (30 trials) using the same auditory stimuli but presented at a 50–50 probability distribution in between the two counterbalanced tasks. The auditory pure tone stimuli were presented for a 100 ms duration, and the intertrial interval for the auditory oddball task was randomly drawn from between 1 and 3 s.
Sound stimuli for the auditory oddball task consisted of high- and low-pitched tones. The low-pitched tone was 100 ms duration and 200 Hz, approximating a square wave. The high-pitched tone was 100 ms duration and 5 kHz frequency, also approximating a square wave. These stimuli were constructed to have distinct perceived pitch and salient onset structure. Stimulus waveforms were matched in amplitude. Sounds were delivered in stereo, using a sound delivery system that was calibrated in the testing suite (B&K type 4939-A-011 calibration mic and NEXUS 4939-A-011 conditioning amplifier). Both speakers had a relatively flat frequency response (±5 dB) across the used frequency range (200–6,000 Hz) and no high- or low-frequency roll-off.
For patients p6, p8, p9 and p11, we also recorded neuronal activity during podcast episodes. Patient p6 listened to three stories, each approximately 7 min long, taken from The Moth Radio Hour (https://themoth.org/podcast). The stories were Wild Women and Dancing Queens, My Father’s Hands and Juggling and Jesus. Each episode consists of a single speaker narrating an autobiographical story. Patient p8 listened to Why We Should NOT Look for Aliens—The Dark Forest, an educational video created by the Kurzgesagt group (Kurzgesagt) (https://www.youtube.com/watch?v=xAUJYP8tnRE). The selected stories were chosen to be varied, engaging and linguistically rich.
Micro-CT
As the recordings were performed only in tissue planned for resection, we first removed a small cube of tissue around the probe and then proceeded with the remainder of the resection. The cube specimens were processed according to previously described methods56. In brief, resected specimens were fixed in 4% PFA for 16 h at 4 °C. They were then stabilized using a modified stability buffer (mStability), containing 4% acrylamide (Bio-Rad, 1610140), 0.25% (w/v) VA044 (Wako Chemical, 017-19362), 0.05% (w/v) saponin (Millipore-Sigma, 84510) and 0.1% sodium azide (Millipore-Sigma, S2002). The samples were equilibrated in the hydrogel solution for 16 h at 4 °C before undergoing cross-linking at −90 kPa and 37 °C for 3 h. After cross-linking, excess hydrogel solution was removed, and specimens were washed four times with 1× PBS. Next, the samples were immersed in 0.1 N iodine and incubated with gentle agitation for 24 h at room temperature before being embedded in agarose and imaged using a Zeiss Xradia Context micro-CT at 3 µm per voxel resolution. The acquired back-projection images were reconstructed using Scout-and-Scan Reconstructor (Carl Zeiss, v.16.8) and converted to NRRD format using the Harwell Automated Recon Processor (HARP, v.2.4.1)57, an open-source, cross-platform application developed in Python. The 3D volumes were analysed, and optical sections were captured using 3D Slicer58. All tissue was inspected with a microscope by S.R.H. and her laboratory, and no abnormalities were reported.
Neuronal data processing
Patients did not experience seizures during the surgery (probably due to propofol anaesthesia), so we did not do any seizure-related data-cleaning. The lack of seizures was confirmed by review of the waveform activity by a trained neurologist.
Motion correction
We used previously developed and validated motion estimation and interpolation algorithms to correct for the motion artefacts from brain movement59. Motion was estimated via the DREDge software package (Decentralized Registration of Electrophysiology Data software; https://github.com/evarol/DREDge) using either a combination of motion traces obtained using raw LFP and/or AP band data, fine-tuned for individual recordings. Motion correction was then implemented using interpolation methods (https://github.com/williamunoz/InterpolationAfterDREDge). Both the AP and LFP band data are motion corrected and were used for further preprocessing and analysis steps. If the estimated motion led to no improvement in the spike locations then spike sorting proceeded with the motion correction package built into Kilosort 4 without performing interpolation.
Unit extraction and classification
Automated spike detection and clustering were performed by Kilosort 2.0 if motion correction was already applied using the DREDge algorithm or KiloSort 4.060 if motion correction was not applied separately. Manually curation of spike clustered was performed using the open-source software Phy61. Unit quality metrics were calculated using SpikeInterface62 and were considered single units if they had a d′ greater than 1 and fewer than 3% of spikes were violations of a 2 ms interspike interval refractory period.
LFP data
LFP data were bandpass-filtered between 0.1 and 20 Hz and aligned to task events to extract local ERPs. The LFP band amplitude in the specific bands was calculated by first band-pass filtering the raw signal within defined frequency limits (for example, 70–150 for high gamma) and then taking the absolute value of the Hilbert-transformed complex signal. Given the high correlation between adjacent channels, only ten channels equally spanning the length of the probe were used to calculate statistics.
Neuronal data analysis
All analyses were performed using custom MATLAB code unless otherwise noted.
Motion analysis
The motion-corrected location estimates were obtained at a 250 Hz sampling frequency using the DREDge algorithm. This signal was downsampled to 10 Hz. The power spectrum of the calculated motion was then estimated using Welch’s overlapped segment averaging estimator for frequencies between 0.1 and 3 Hz. The amount of motion was defined as the r.m.s.e. of the location trace of the probes centre relative to its average location.
Tone responses
Both single units and multiunits were used for all analyses. A tone responsive neuron was defined as having a statistically significant increase in the average firing rate in the first second after tone onset (shifted by 50 ms to account for neural processing delay to the hippocampus) relative to the preceding 200 ms baseline (α < 0.05, Wilcoxon signed-rank test). Visual demonstrations of the peri-stimulus average firing rate were smoothed through a causal Gaussian filter with a s.d. of 150 ms for visualization; however, all statistical analyses were performed on the raw spike count. Response onset latency was computed as the time taken to the peak response. A mixed Gaussian model with two components was then fit to the distribution of latencies. Given the trough between the two peaks at 291 ms and evidence of average oddball response occurring in the first segment, a window of 0–300 ms was used for analysis characterizing tone and oddball selectivity.
Neural tuning
To determine response tuning properties, we modelled trial responses in the peristimulus period using general linear regression models. Neural data in the analysis time window of 0–300 ms was used for tuning analyses. Unit response was defined as the average firing rate, LFP power was defined as the root-mean-squared value of the band-pass-filtered LFP, and gamma power was defined as the average gamma band amplitude. All vectors were z-scored to allow for comparison of the neural response modulation across units/channels. The independent variables were effects-coded for tone type (frequency 1 versus frequency 2), trial type (standard versus oddball) and an interaction term (conjunctive coding). We set the α level at 0.05 to determine whether the β coefficient for the independent variables was statistically significant.
Neuronal population coding
To determine the information content present in the population, a SVM with a linear kernel was trained using tenfold cross validation for 200 iterations. Accuracy for each iteration was defined as the average accuracy across the folds. Significant coding was defined as the distribution of 200 iterations being statistically different from 0.5 (chance). Algorithm validation was performed by shuffling the dataset and demonstrating that it always performed at chance level. Subsampling was performed to avoid performance bias from an unbalanced dataset (that is, more standard trials than oddball trials). To investigate the neuronal population response dynamics for tone and oddball encoding as a function of time, we used sets of sequential trials (50 trials) from each of the two counterbalanced blocks (total of 100 trials). For example, the first set was using trials 1:50 and 181:230, whereas the last set was using trials 101:150 and 281:330. Decoding analyses were also run separately for early versus late trials (first 75 versus last 75 trials within a 150-trial block) for tone and oddball encoding, respectively.
Neuronal response learning dynamics
Next, to determine the neural mechanism underlying statistical learning required for oddball detection, we evaluated single-trial response dynamics across the neuronal population. For each trial, we generated a neuronal response population vector. We then computed the Euclidean distance (\(\Vert \bfu-\bfv\Vert \)) and cosine angle (\(\mathrminvcos(\bfu\cdot \bfv/\Vert \bfu\Vert \times \Vert \bfv\Vert )\)) between the mean vector across all standard trials and each individual oddball unit vector, evaluating each as a function of the oddball index.
Mixed-effects models
Where applicable, we used mixed effects models to quantify how task conditions affect spike count and other neurophysiological variables while accounting for the hierarchical data structure of multiple subjects, neurons and channels. For analyses of spike count, we summed spikes over equivalent durations across task conditions and fit a GLME model using a log link function and modelling spike counts as Poisson distributed. For LFP variables, a LME model was used. All analyses used a random effects structure for neurons or channels nested within-participant.
Continuous-rate RNN model
We implemented a continuous-rate RNN and trained it to perform an oddball detection task, closely mirroring the one used for the experimental dataset. The network contains 200 recurrently connected units (80% of which are excitatory and 20% of which are inhibitory units). The network is governed by the following equation:
$$\tau _i(dx_i/dt)=-x_i(t)+W\cdot r(t)+u(t)$$
$$r_i(t)=1/(1+\rme^-x_i(t))$$
$$o(t)=W_\rmo\rmu\rmt\times r(t)$$
where τi represents the synaptic decay time constant, xi(t) indicates the synaptic current variable for neuron i at timepoint t, W is the recurrent connectivity matrix (N by N, that is, 200 by 200), and u(t) is the input data given to the network at timepoint t. u is a 2-by-200 matrix where the first dimension refers to the number of input channels and the second dimension is the total number of timepoints. A firing rate of a unit was estimated by passing the synaptic current variable (x) through a standard logistic sigmoid function. The output (o) of the network was computed as a linear weighted sum of the entire population of units.
In each trial, the network model receives input data mimicking auditory signals. The input consists of two signal streams, each representing a distinct auditory tone (that is, tone A versus tone B; Fig. 3f,g). Only one tone is presented to the network per trial. The model was trained to produce an output signal approaching +1 when tone A was presented and an output signal approaching −1 when tone B was presented. To closely replicate the experimental task design, we used three different sequential contexts during network training. In the first stage, tone A was presented predominantly (80% of trials), followed by an equal distribution of tone A and tone B (50/50) in the second stage. In the third stage, tone B was predominant (80%).
We optimized the network parameters, including recurrent connectivity, readout weights and synaptic decay time constants, using gradient descent via backpropagation through time (BPTT). The network was required to achieve over 95% task accuracy in the current context before a new context was introduced. To evaluate the model’s ability to decode both tone identity and oddball context, we performed linear SVM decoding using population activity from the recurrent units. For each decoding analysis, we generated 100 trials for each condition. A linear SVM classifier was trained using 70% of the trials and tested on the remaining 30%, and this procedure was repeated 100 times to estimate decoding accuracy. Separate SVM classifiers were trained for tone identity and oddball context classification.
Neuronal data analysis: natural language stimuli
Natural language stimuli
All of the patients were native English speakers. The podcast played during the task was automatically transcribed using Assembly AI (https://www.assemblyai.com/). The transcribed words and corresponding timestamp outputs from Assembly AI were converted to a TextGrid and then loaded into Praat63. The original .wav file was also loaded into Praat and the spectrograms and labels and timestamps were manually checked and corrected to ensure the word onset and offset times were accurate. This process was then repeated by a second reviewer to ensure the validity of the time stamps. The TextGrid output of corrected words and timestamps from Praat was converted to an Excel file and loaded into MATLAB and Python for further analysis.
Natural language statistics
Word frequency was defined based on a corpus of movie subtitles spanning a total of 51 million words22. Words that did not elicit a response during the duration of the word were excluded from this analysis. To compare the relative contributions to firing rate, a linear model was trained to estimate the logarithmic firing rate from the logarithmic duration and corpus frequency of each word. Word surprisal values were calculated using the GPT-2 large model64 from the Hugging Face Transformers library65, computing the negative log probability of each word conditioned on the preceding context. Specifically, surprisal was defined by the equation
$$\mathrmsurprisal=-\log P(w_i|w_i-1,w_i-2,\ldots ,w_i-1,w_1)$$
where P(wi) refers to the probability of word i given the preceding words.
We used the pre-trained fastText Word2Vec model in MATLAB to extract word embeddings for all words in our dataset66,67. This pretrained model provides 300-dimensional word embedding vectors, trained on 16 billion tokens from Wikipedia, UMBC webbase corpus and https://statmt.org, to capture semantic relationships between words. Notably, Word2Vec is a non-contextual embedder, so all instances of the same word are represented the same. Some surname words, such as Harwood or proper nouns like Applebee’s did not have word embeddings and were discarded from the analysis. A simple linear model was trained to predict the firing rate of individual neurons from the semantic matrices using tenfold cross-validation. Accuracy was defined as the correlation between true and predicted firing rates. Words with 0 Hz or above 25 Hz were removed from this analysis. To prevent overfitting, principal component analysis (PCA) was used to reduce the dimensionality of the semantic embedding vectors to account for 30% of the variance before modelling. This threshold was defined as the minimum of the r.m.s.e. of the model that balanced under and overfitting. To predict future or previous words the alignment between words was shifted forwards or backwards, respectively. This relationship was then fit with a piecewise exponential decay
$$r(i)=\beta _\times e^i/\beta _1\,\mathrmfor\,i\ge 0$$
$$r(i)=\beta _\times e^-i/\beta _2\,\rmf\rmo\rmr\,i < 0$$
Where β0 is the amplitude of the correlation at 0 lag, and β1 and β2 are equivalent to the time constant of the decay for positive and negative lags, respectively.
Word embedding, semantic clustering and part of speech classification
To identify the natural semantic categories present in our word data, all unique words heard by the participants were clustered into groups using a word-embedding approach21,68. We used the same 300-dimensional embedding from the previous GLM analysis. To compute and visualize semantic clusters, we first used a t-distributed stochastic neighbour embedding algorithm on word embedding values to reduce the dimensionality of each unique word based on their cosine distance to all other words, therefore reflecting their semantic similarity. Words with similar meanings now have similar 2D coordinates. We then applied the k-means clustering algorithm to these 2D word representations and visualized clustered words on a 2D word map (12 clusters). We then manually inspected and assigned a distinct label to each semantic cluster and adjusted clusters for accuracy. For example, words bordering the edges of clusters would sometimes get misgrouped and were manually corrected. The final 12 semantic categories of the words are body parts, places, emotional words, mental words, social words, objects, visual words, numerical words, actions, identity words, function words and proper nouns. Correction for multiple comparisons was performed using the Benjamini–Hochberg approach. A SVM was trained for each semantic category (versus all other categories) using a radial basis function kernel. Model training and accuracy metrics were weighted to the relative frequency of each group. We used tenfold cross validation and 200 iterations.
To extract part-of-speech (POS) for each word in the dataset, we used an automated pipeline through Stanford CoreNLP, a natural language processing toolkit25. We initialized a CoreNLPParser with the ‘pos’ tag-type, which specializes in POS tagging. The transcript was first segmented into sentences based on punctuation. Each sentence was then tokenized and passed through the CoreNLPParser’s tagging function. This process leveraged CoreNLP’s advanced linguistic models to analyse the context and structure of each sentence, assigning appropriate POS tags to individual words. The 15 POS types were as follows: noun, adjective, numeral, determiner, conjunction, preposition or subordinating conjunction, auxiliary, possessive pronoun, pronoun, adverb, particle, interjection, verb, wh-word and existential. POS types with fewer than 45 words were removed from analysis. A similar SVM was used for POS.
Probe localization
Intraoperative navigation (StealthStation navigation platform, Medtronic) was used to the label probe entry site after it was removed from the brain. RAVE was used to transform patient-specific coordinates into MNI152 average space and plot them on a glass brain with hippocampal segmentation51.
Ethics statement
Experiments were conducted according to protocol guidelines approved by the Institutional Review Board for Baylor College of Medicine and Affiliated Hospitals (H-50885 for the Neuropixels recordings, and H-18112 for the EMU recordings).
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

