Ethics statement
The experiment was approved by the institutional ethics committees of each participating data-collecting laboratory, including the Science, Technology, Engineering and Mathematics Ethical Review Committee at the Centre for Human Brain Research, University of Birmingham (ERN_18-0226AP20); the Committee for Protecting Human and Animal Subjects at the School of Psychological and Cognitive Sciences, Peking University (2020-05-07e); the Commissie Mensgebonden Onderzoek Regio Arnhem-Nijmegen at the Centre for Cognitive Neuroimaging at Donders Institute (NL45659.091.14); the Human Research Protection Program Institutional Review Board at Yale School of Medicine (2000027591); the Office of Science and Research Institutional Review Board at New York University Langone Health (i14-02101_CR6); the Boston Children’s Hospital Institutional Review Board at Children’s Hospital Corporation d/b/a Boston Children’s Hospital (04-05-065R); the Institutional Review Board at the University of Wisconsin-Madison (ID: 2017-1299); and the Ethics Council of the Max Planck Society at Max Planck Institute for Empirical Aesthetics (Nr. 2017_12). All participants and patients provided oral and written informed consent before participating in the study. All study procedures were carried out in accordance with the Declaration of Helsinki. Patients were also informed that clinical care was not affected by participation in the study.
Participants
Healthy participants and patients with pharmaco-resistant focal epilepsy participated in this study. The datasets reported here consist of: (1) behaviour, eye tracking and iEEG data collected at the Comprehensive Epilepsy Center at New York University (NYU) Langone Health, the Brigham and Women’s Hospital, the Boston Children’s Hospital (Harvard), and the University of Wisconsin School of Medicine and Public Health (WU). (2) Behaviour, eye tracking, MEG and EEG data collected at the Centre for Human Brain Health (CHBH) of the University of Birmingham (UB), and at the Center for MRI Research of Peking University (PKU). (3) Behaviour, eye tracking and fMRI data collected at the Yale Magnetic Resonance Research Center (MRRC) and at the Donders Centre for Cognitive Neuroimaging (DCCN), of Radboud University Nijmegen. For both the MEG and fMRI datasets, one-third of the data that passed quality tests (henceforth, the optimization dataset; see the section ‘Preregistration’ for details about quality test criteria27) were used to optimize the analysis methods, which were subsequently added to the preregistration as an additional amendment. These preregistered analyses were then run on the remaining two-thirds of the data (henceforth, the replication dataset) and constitute the data reported in the main study. This procedure was not used for the iEEG data due to the serendipitous nature of the recording and electrode placement, the rarity of this type of data and the increased difficulty of data collection due to the COVID-19 pandemic.
A total of 97 healthy participants were included in the MEG sample (mean age of 22.79 ± 3.59 years, 54 females, all right handed), 32 of those datasets were included in the optimization phase (mean age of 22.50 ± 3.43 years, 19 females, all right handed), and 65 in the replication sample (mean age of 22.93 ± 3.66, 35 females, all right handed). Five additional participants were excluded from the MEG dataset: two because of failure to meet predefined behavioural criteria (that is, Hits of less than 80% and/or False Alarms > 20%), two because of excessive noise from sensors, and one because of incorrect sensor reconstruction. A total of 108 healthy participants were included in the fMRI sample (mean age of 23.28 ± 3.46 years, 70 females, 105 right handed); 35 of those datasets were included in the optimization sample (mean age of 23.26  ±  3.64 years, 21 females, 34 right handed) and 73 in the replication sample (mean age of 23.29 ± 3.37, 49 females, 71 right handed). Twelve additional participants were excluded from the fMRI dataset: eight because of motion artefacts, two because of insufficient coverage and two because of incomplete data (with respect to these last two participants, see section 14 of the Supplementary Information for deviations from the preregistration document). For the iEEG arm of the project, a total of 34 patients were recruited. Two patients were excluded owing to incomplete data. Demographic, medical and neuropsychological scores for each patient, when available, are reported in Supplementary Table 25. Three iEEG patients whose behaviour fell slightly short of the predefined behavioural criteria (that is Hits of less than 70%, FA > 30%) were nonetheless included given the difficulty of obtaining additional iEEG data (see section 14 in Supplementary Information for deviation from the preregistration).
Experimental procedure
Experimental design
To test critical predictions of the theories, five experimental manipulations were included in the experimental design: (1) four stimulus categories (faces, objects, letters and false fonts), (2) 20 stimulus identities (20 different exemplars per stimulus category), (3) three stimulus orientations (front, left and right view), (4) three stimulus durations (0.5 s, 1.0 s and 1.5 s), and (5) task relevance (relevant targets, relevant non-targets and irrelevant).
Stimulus category, stimulus identity and stimulus orientation served to test predictions about the representation of the content of consciousness in different brain areas by the theories. In addition, stimulus duration served to test predictions about the temporal dynamics of sustained conscious percepts and interareal synchronization between areas. Task relevance served to rule out the effect of task demands, as opposed to conscious perception per se, on the observed effects62. This aspect of the experimental design was inspired by ref. 63.
Stimuli
Four stimulus categories were used: faces, objects, letters and false fonts. These stimuli naturally fell into two clearly distinct groups: pictures (faces and objects) and symbols (letters and false fonts). These natural couplings were aimed at creating a clear difference between task-relevant and task-irrelevant stimuli in each trial block (see the section ‘Procedure’). All stimuli covered a squared aperture at an average visual angle of 6° by 6°. Face stimuli were created with FaceGen Modeler 3.1; letter and false font stimuli were generated with MAXON CINEMA 4D Studio (RC – R20) 20.059; object stimuli were taken from the Object Databank64. Stimuli were grey scaled and equated for luminance and size. To facilitate face individuation, faces had different hairstyles and belonged to different ethnicities and genders. Equal proportions of male and female faces were presented. The orientation of the stimuli was manipulated, such that half of the stimuli from each category had a side view (30° and −30° horizontal viewing angle, left and right orientation) and the other half had a front view (0°).
Procedure
Participants performed a non-speeded target detection task (see Supplementary Video 1). The experiment was divided into runs, with four blocks in each run (see the section ‘Trial counts’). On a given block, participants viewed a sequence of single, supra-threshold, foveally presented stimuli belonging to one of four stimulus categories and presented for one of three stimulus durations onto a fixation cross that was present throughout the experiment. Within each block, half of the stimuli were task-relevant and half were task-irrelevant. To manipulate task relevance, at the beginning of each block participants were instructed to detect the rare occurrences of two target stimulus identities, one from each relevant category (for pictures, face–object; for symbols, letter–false font), irrespective of their orientation. This was specified by presenting the instruction ‘detect face A and object B’ or ‘detect letter C and false font D’, accompanied by images for each target (see Fig. 1d). Targets did not repeat across blocks. Each run contained two blocks of the face–object task and two blocks of the letter–false font task, with block order counterbalanced across runs.
Accordingly, each block contained three different trial types: (1) targets: the two stimuli being detected (for example, the specific face and object identities); (2) task-relevant stimuli: all other stimuli from the task-relevant categories (for example, the non-target faces–objects); and (3) task-irrelevant stimuli: all stimuli from the two other categories (for example, letters–false fonts). An advantage of this design is that the three trial types enabled a differentiation of neural responses related to task goal, task relevance and simply consciously seeing a stimulus. We confirmed that participants were conscious of the stimuli in both the task-relevant and task-irrelevant trials in a separate experiment, which included a surprise memory test (see section 3 in Supplementary Information).
Stimuli were presented for one of three durations (0.5 s, 1.0 s or 1.5 s), followed by a blank period of a variable duration to complete an overall trial length fixed at 2.0 s. For the MEG and iEEG version, random jitter was added at the end of each trial (mean inter-trial interval of 0.4 s, jittered 0.2–2.0 s, truncated exponential distribution) to avoid periodic presentation of the stimuli. The mean trial length was 2.4 s. For the fMRI protocol, timing was adjusted as follows: the random jitter between trials was increased (mean inter-trial interval of 3 s, jittered 2.5–10 s, with truncated exponential distribution), with each trial lasting approximately 5.5 s. This modification helped with avoiding non-linearities in BOLD signal, which may affect fMRI decoding65. Second, to increase detection efficacy for amplitude-based analyses, three additional baseline periods (blank screen) of 12 s each were included per run (total of 24). The identity of the stimuli was randomized with the constraint that they appeared equally across durations and tasks conditions. Participants were further instructed to maintain central fixation on a black circle with a white cross and another black circle in the middle throughout each trial (see Supplementary Fig. 1d and Supplementary Video 1 for a demonstration of the experimental paradigm).
Trial counts
The MEG study consisted of 10 runs containing 4 blocks each with 34–38 trials per block, 32 non-targets (8 per category) and 2–6 targets, for a total of 1,440 trials. The same design was used for iEEG, but with half the runs (5 runs total), resulting in a total of 720 trials. For fMRI, there were 8 runs containing 4 blocks each with 17–19 trials per block, 16 non-targets (4 per category) and 1–3 targets, for a total of 576 trials. Rest breaks between runs and blocks were included.
Data acquisition
Behavioural data acquisition
The task was run on Matlab (PKU: R2018b; DCCN, UB and Yale: R2019b; Harvard: R2020b; NYU: R2020a, and WU: 2021a) using Psychtoolbox (v3)66. The iEEG version of the task was run on a Dell Precision 5540 laptop, with a 15.6′′ Ultrasharp screen at NYU and Harvard and on a Dell D29M PC with an Acer 19.1′′ screen in WU. Participants responded using an eight-button response box (Millikey LH-8; response hand (or hands) varied based on the setting in the patient’s room). The MEG version was run on a custom PC at UB and a Dell XPS desktop PC on PKU. Stimuli were displayed on a screen placed in front of the participants with a PROPixx DLP LED projector (VPixx Technologies). Participants responded with both hands using two 5-button response boxes (NAtA or SINORAD). The fMRI version was run on an MSI laptop at Yale and a Dell Desktop PC at DCCN. In DCCN, stimuli were presented on an MRI compatible Cambridge Research Systems BOLD screen 32′′ IPS LCD monitor, and in Yale they were presented on a Psychology Software Tools Hyperion projection system to project stimuli on the mirror fixed to the head coil. Participants responded with their right hand using a 2 × 2 current designs response box at Yale and a 1 × 4 current designs response box at DCCN.
Eye tracking data acquisition
For the iEEG setup, eye tracking and pupillometry data were collected using a EyeLink 1000 Plus in remote mode, sampled monocularly at 500 Hz (from the left eye at WU, and depending on the setup at Harvard), or on a Tobii-4C eye tracker, sampled binocularly at 90 Hz (NYU). The MEG and fMRI laboratories used the MEG-compatible and fMRI-compatible EyeLink 1000 Plus Eye-tracker system (SR Research) to collect data at 1,000 Hz. For MEG, eye tracking data were acquired binocularly. For fMRI, data were acquired monocularly from either the left or the right eye, in DCCN and Yale, respectively. For all recordings, a 9-point calibration was performed (besides Harvard, where a 13-point calibration was used) at the beginning of the experiment, and recalibration was carried out as needed at the beginning of each block or run.
iEEG data acquisition
Brain activity was recorded with a combination of intracranial subdural platinum-iridium electrodes embedded in SILASTIC sheets (2.3-mm diameter contacts, Ad-Tech Medical Instrument and PMT Corporation) and/or depth stereo-electroencephalographic platinum-iridium electrodes (PMT Corporation; 0.8 mm in diameter, 2.0-mm length cylinders; separated from adjacent contacts by 1.5–2.43 mm), or depth stereo-electroencephalographic platinum-iridium electrodes (BF08R-SP21X-0C2, Ad-Tech Medical; 1.28 mm in diameter, 1.57 mm in length, 3–5.5-mm spacing). Electrodes were arranged as grid arrays (either 8 × 8 with 10-mm centre-to-centre spacing, 8 × 16 contacts with 3-mm spacing, or hybrid macro–micro 8 × 8 contacts with 10-mm spacing and 64 integrated microcontacts with 5-mm spacing), linear strips (1 × 8/12 contacts), depth electrodes (1 × 8/12 contacts) or a combination thereof. Recordings from grid, strip and depth electrode arrays were done using a Natus Quantum amplifier or a Neuralynx Atlas amplifier. A total of 4,057 electrodes (892 grids, 346 strips and 2,819 depths) were implanted across 32 patients with drug-resistant focal epilepsy undergoing clinically motivated invasive monitoring. A total of 3,512 electrodes (780 grids, 307 strips and 2,425 depths) that were unaffected by epileptic activity, artefacts or electrical noise were used in subsequent analyses. To determine the electrode localization for each patient, a post-operative computed tomography scan and a pre-operative T1 MRI were acquired and co-registered.
MEG data acquisition
MEG was acquired using a 306-sensor TRIUX MEGIN system, comprising 204 planar gradiometres and 102 magnetometres in a helmet-shaped array. The MEG gantry was positioned at 68° for optimal coverage of frontal and posterior brain areas. Simultaneous EEG was recorded using an integrated EEG system and a 64-channel electrode cap (EEG data are not reported here, but are included in the shared dataset). During acquisition, MEG and EEG data were bandpass filtered (0.01 and 330 Hz) and sampled at 1,000 Hz. The location of the head fiducials, the shape of the head, the positions of the 64 EEG electrodes and the head position indicator (HPI) coil locations relative to anatomical landmarks were collected with a 3D digitizer system (Polhemus Isotrack). ECG was recorded with a set of bipolar electrodes placed on the chest of the participant. Two sets of bipolar electrodes were placed around the eyes (two at the outer canthi of the right and left eyes and two above and below the centre of the right eye) to record eye movements and blinks (EOG). Ground and reference electrodes were placed on the back of the neck and on the right cheek, respectively. The head position of participants on the MEG system was measured at the beginning and end of each run, and also before and after each resting period, using four HPI coils placed on the EEG cap, next to the left and right mastoids and over left and right frontal areas.
Anatomical MRI data acquisition
For source localization of the MEG data with individual realistic head modelling, a high-resolution T1-weighted MRI volume (3 T Siemens MRI Prisma scanner) was acquired per participant. Anatomical scans were acquired either with a 32-channel coil (repetition time (TR)/echo time (TE) = 2,000/2.03 ms; inversion time (TI) = 880 ms; 8° flip angle; field of view = 256 × 256 × 208 mm; 208 slices; 1-mm isotropic voxels, UB) or a 64-channel coil (TR/TE = 2,530/2.98 ms; TI = 1,100 ms; 7° flip angle; field of view = 224 × 256 × 192 mm, 192 slice, 0.5 × 0.5 × 1 mm voxels, PKU). The FreeSurfer standard template was used (fsaverage) for participants lacking an anatomical scan (n = 5).
fMRI data acquisition
MRI data were acquired using a 32-channel head coil on a 3 T Prisma scanner. A session included high-resolution anatomical T1-weighted MPRAGE images (GRAPPA acceleration factor = 2, TR/TE = 2,300/3.03 ms, 8° flip angle, 192 slices, 1-mm isotropic voxels), and a whole-brain T2*-weighted multiband-4 sequence (TR/TE = 1,500/39.6 ms, 75° flip angle, 68 slices, voxel size of 2 mm isotropic, anterior/posterior (A/P) phase-encoding direction, field of view = 210 mm, bandwith (BW) = 2,090 Hz px−1). A single-band reference image was acquired before each run. To correct for susceptibility distortions, additional scans using the same T2*-weighted sequence, but with inverted phase-encoding direction (inverted readout/phase-encoding (RO/PE) polarity), were collected while the participant was resting at multiple points throughout the experiment.
Preprocessing and analysis details
For readability, we first detail the preprocessing protocols for each of the modalities (iEEG, MEG and fMRI) separately. Then, we describe the different analyses, combining information across the modalities, while noting any differences between them.
Analysis strategy
As part of our testing framework, after excluding a limited number of participants due to data quality checks, we conducted an initial optimization phase on one-third of the MEG (n = 32) and fMRI (n = 35) datasets to evaluate data quality across sites and to optimize analysis pipelines. Following the optimization phase, pipelines were preregistered27 and applied to the novel datasets containing twice as much data (MEG n = 65 and fMRI n = 73).
In the main paper, we report results obtained on the novel, previously unexamined datasets. For iEEG, given the smaller sample, a different analysis strategy was implemented. We refer the reader to the iEEG methods section and text in the main paper for numbers of participants that were entered in each analysis. Results from the optimization phase are reported in section 4 of Supplementary Information. The results of the optimization phase and the preregistered replication phase were compared and deemed to be largely compatible, with some minor exceptions (section 4 of Supplementary Information).
iEEG preprocessing
Data were converted to BIDS67 and preprocessed using MNE-Python (v0.24)68, and custom-written functions in Python and Matlab. Preprocessing steps included downsampling to 512 Hz, detrending, bad channel rejection, line noise and harmonic removal, and re-referencing. Electrodes were re-referenced to a Laplacian scheme69, whereas bipolar referencing was used for electrodes at the edge of a strip, grid or stereo EEG, and the signal was localized at the midpoint (Euclidean distance) between the two electrodes. Electrodes with no direct neighbours were discarded. Seizure-onset zone electrodes, those localized outside the brain and/or containing no signal or high amplitude noise level were discarded. Line noise and harmonics were removed using a one-pass, zero-phase non-causal band-stop FIR filter.
The high-gamma power (70–150 Hz) was obtained by bandpass filtering the raw signal in eight successive 10-Hz-wide frequency bands, computing the envelope using a standard Hilbert transform, and normalizing it (dividing) by the mean power per frequency band across the entire recording. To produce a single high-gamma envelope time series, all frequency bands were averaged together70. Most analyses focused on the high-gamma power as it closely correlated with neural spiking activity71 and with the BOLD signal37. To obtain the event-related potentials (ERPs), the raw signal was low-pass-filtered at 30 Hz with a one-pass, zero-phase, non-causal low-pass FIR filter. Epochs were segmented between 1-s pre-stimulus until 2.5-s post-stimulus of interest.
Surface reconstruction and electrode localization
Electrode positions were determined based on a computed tomography scan coregistered with a pre-implant T1-weighted MRI. A 3D reconstruction of the brain of each patient was computed using FreeSurfer (http://surfer.nmr.mgh.harvard.edu). For visualization, the electrode positions for individual participants were converted to the Montreal Neurological Institute (MNI)152 space. As each theory specified a set of anatomical ROIs, after electrode localization, electrodes were labelled according to the Freesurfer-based Destrieux atlas segmentation72,73 and/or Wang atlas segmentation74.
Identification of task-responsive channels
To identify task-responsive electrodes, we computed the area under the curve (AUC) for the baseline (−0.3 to 0 s) and the stimulus-evoked period (0.05–0.35 s) separately for the task-relevant and task-irrelevant conditions, and compared them per electrode using a Wilcoxon sign-rank test, corrected for false discovery rate (FDR)75. A Bayesian t-test76 was used to quantify evidence for non-responsiveness.
Identification of category-selective channels
To determine category selectivity for faces, objects, letters and false fonts in the high gamma, we followed the method of Kadipasaoglu and colleagues77. Per category, we computed a d′ (AUC of 0.05–0.4 s) comparing the activation between the category of interest (uj) and each of the other categories (ui), normalized by the standard deviation of each category:
$${d}^{{\prime} }=\frac{{u}_{j}-\frac{1}{N}{\sum }_{i}^{N}{u}_{i}}{\sqrt{\frac{1}{2}\left({\sigma }_{j}^{2}+\frac{1}{N}{\sum }_{i}^{N}{\sigma }_{i}^{2}\right)}};i\ne j$$
A permutation test (10,000 permutations) was used to evaluate significance. d′ was computed for the task-relevant and task-irrelevant conditions separately. An electrode was considered selective if it showed selectivity on both tasks.
Multivariate analysis electrodes combination
Owing to the sparse and highly variable coverage of iEEG data, all collected electrodes were combined into a ‘super participant’ multivariate analyses (RSA and decoding). To create a single-trial matrix for the super participant, we equated the trial matrices of all our participants by subsampling to the lowest number of trials in the relevant conditions. Participants that did not complete the full experiment were discarded (n = 3), resulting in a total of 29 participants with 583 electrodes in posterior ROIs and 576 electrodes in prefrontal ROIs. For analyses on stimuli identities, stimuli that were presented less than three times to any of the participants across intermediate and long trials in the task-relevant and task-irrelevant trials were discarded. We then subsampled the trials for each identity to three trials per participant. The subsampling procedure was repeated 100 times to avoid random fluctuations induced by the subsampling. The analysis was computed for each repetition and average across repetitions.
MEG preprocessing
The MEG data were converted to BIDS78 using MNE-BIDS79, and preprocessed following the FLUX Pipeline80 in MNE-Python (v0.24.0)68. Preprocessing steps included MEG sensor reconstruction using a semi-automatic detection algorithm and signal-space separation81 to reduce environmental artefacts. FastICA82 was used to detect and remove cardiac and ocular components from the data for each participant (mean = 2.90 components, s.d. = 0.92). Before ICA, data were segmented, and segments containing muscle artefacts were removed. After preprocessing, data were epoched into 3.5-s segments (1-s pre-stimulus to 2.5-s post-stimulus onset). Trials in which gradiometre values exceeded 5,000 fT cm−1, magnetometres exceeded 5,000 fT and/or the trial contained muscle artefacts were rejected from the MEG dataset. Finally, to be included in the analyses, participants should have a minimum of 30 clean trials per condition. No participants were excluded because of not meeting this criterion.
Source modelling
MEG source modelling was performed using the dynamic statistical parametric mapping method83, based on depth-weighted minimum-norm estimates (MNEs)84,85, on epoched and baseline (−0.5 s to 0 s before stimulus onset) corrected data. To build a forward model, the MRI images were manually aligned to the digitized head shape. A single shell boundary elements model was constructed in MNE-Python based on the inner skull surface derived from FreeSurfer72,73, to create a volumetric forwards model (5-mm grid) covering the full-brain volume. The lead field matrix was then calculated according to the head position with respect to the MEG sensor array. A noise covariance matrix for the baseline and a covariance matrix for the active time window were calculated and the combined (that is, sum) covariance matrix was used with the forwards model to create a common spatial filter. Data were spatially pre-whitened using the covariance matrix from the baseline interval to combine gradiometre and magnetometre data86.
fMRI preprocessing
Source DICOM data were converted to BIDS using BIDScoin (v3.6.3)87. This includes converting DICOM data to NIfTI using dcm2niix88 and creating event files using custom Python codes. BIDS compliance of the resulting dataset was controlled using BIDS-Validator. Subsequently, MRI data quality control was performed using MRIQC (0.16.1)89 and custom scripts for data rejection. All (f)MRI data were preprocessed using fMRIPrep (20.2.3)90, based on Nipype (1.6.1)91. For further details on the fMRIprep pipeline, see preregistration. Custom scripts used NumPy (1.19.2)92 and Pandas (1.1.3)93.
Analysis-specific functional preprocessing
Additional, analysis-specific, fMRI data preprocessing was performed using FSL 6.0.2 (FMRIB Software Library)94, Statistical Parametric Mapping (SPM 12) software95, and custom Python scripts (using NiBabel (3.2.2)96 and SciPy (1.8.0)97 after the above-outlined general preprocessing. Functional data for univariate data analyses were spatially smoothed (Gaussian kernel with full-width at half-maximum of 5 mm), grand mean scaled and temporal high-pass filtered (128 s). No spatial smoothing was applied for multivariate analyses.
Contrast of parameter estimates
We modelled BOLD signal responses to the experimental variables by fitting voxel-wise general linear model (GLM) to the data of each run using FSL FEAT. The following regressors were modelled in an event-related approach, with event duration corresponding to the stimulus duration (that is, 0.5 s, 1.0 s and 1.5 s), and convolved with a double gamma haemodynamic response function: 12 regressors of interest (targets, task-relevant and task-irrelevant stimuli per stimulus category, that is, faces, objects, letters and false fonts; and a regressors of no interest, that is, target screen display). We included the first-order temporal derivatives of the regressors of interest, and a set of nuisance regressors: 24 motion regressors (FMRIB Software Library (FSL)’s standard + extended set of motion parameters) plus a cerebrospinal fluid (CSF) and a white matter (WM) tissue regressor. Each of the 12 regressors of interest was contrasted against an implicit baseline (used in the putative Neural Correlates of Consciousness analysis; see below). In addition, we obtained contrast of parameter estimates for ‘relevant faces versus relevant objects’, ‘relevant letters versus relevant false fonts’, ‘irrelevant faces versus irrelevant objects’, ‘irrelevant letters versus irrelevant false fonts’ (used for the definition of decoding ROIs), ‘relevant and irrelevant faces versus relevant and irrelevant objects’ and ‘all stimuli versus baseline’ (used for the definition of seeds for the generalized psychophysiological interaction (gPPI) analysis). Data were averaged across runs per participant using FSL’s fixed-effects analysis and subsequently averaged across participants using FSL’s FLAME1 mixed-effect analysis. Gaussian random-field cluster thresholding was used to correct for multiple comparisons, using the default settings of FSL, with a cluster formation threshold of one-sided P < 0.001 (z ≥ 3.1) and a cluster significance threshold of P < 0.05.
Anatomical ROIs
ROIs were defined a priori in consultation with the adversarial theories. They were determined per participant based on the Destrieux atlas73 including both hemispheres, and then resampled to standard MNI space (see Supplementary Table 26). For the connectivity analysis, areas V1/V2 (combining dorsal and ventral) were defined based on the Wang cortical parcellation74. For details on the process of selecting the ROIs and the justification of the ROI selection in the context of this study, see section 10 in Supplementary Information. All anatomical segmentations were performed using Freesurfer (6.0.1)72.
Behavioural analyses
Log-linear-corrected d′ (ref. 98), false alarms and reaction times were computed per category and stimulus duration, separately (false alarms were also calculated per task relevance, without duration) and per modality (iEEG, MEG and fMRI). These measures were compared with linear–logistic mixed models, where appropriate. For the former, we report analysis of variance omnibus F-tests, and for the latter, omnibus χ2 test from an analysis of deviance. We approximated degrees of freedom using the Satterthwaite method99. Pairwise t-tests following significant interactions were Bonferroni corrected. To estimate Bayesian information criterion (BIC) differences between the original and null logistic models, we used the P values and sample size100 (p_to_bf package in R).
Eye-tracking analyses
For Eyelink, gaze and pupil data were segmented, and trials with missing data were excluded. Blinks were detected using the Hershman algorithm101, and removed with 200-ms padding102. The Eyelink standard parser algorithm was used for saccade and fixation detection. Saccades were further corroborated using the Engbert and Kliegl103 algorithm. Fixations were baseline corrected (−0.25 s to 0 s). Mean fixation distance, mean blink rate, mean saccade amplitude and mean pupil size were compared in a LMM with category and task relevance as fixed effects, and participant and item as random effects. Separate analyses were carried out on the first 0.5 s after stimulus onset including all trials; and on the 1.5-s trials including time window (0–0.5 s, 0.5–1.0 s and 1.0–1.5 s) as fixed effects. BIC was used to test the models against the null hypothesis models. For Tobii, gaze coordinate data were segmented, missing data were excluded and coordinates were baseline corrected to depict heatmaps of patients’ gaze. Of note, the coordinate data were not added to the LMMs due to its poorer quality with respect to the EyeLink data.
Decoding analysis
All decoding analyses were performed using a linear support vector machine (SVM; scikit learn (0.23.2), https://scikit-learn.org/) classifier. Below, we explain how this was done for each one of the predictions.
iEEG decoding was done on the high-gamma signal, averaged over non-overlapping windows of 0.02 s separately for electrodes located in the GNWT and IIT ROIs. The top 200 electrodes (selectKbest104), as determined by a F-test within a given set of electrodes from the theory ROIs, were used as features for the classifier. Two-hundred features were selected to provide a balance between model optimization (for example, feature selection) and participant representation (for example, electrodes or features coming from multiple participants). Statistical significance of decoding performance was assessed via permutation test, randomly permuting the sample labels and repeating the decoding analysis 1,000 times, corrected for multiple comparisons using a cluster-based correction (cluster mass inference with cluster forming threshold at P < 0.05)105,106. Also, to assess the decoding accuracy within unique ROIs (for example, S_temporal_sup of the Destrieux atlas), separate classifiers were trained using all electrodes in a given parcel. Each classifier was fitted using all electrodes in a parcel and time window (GNWT: 0.3–0.5 s, IIT: 0.3–1.5 s) as features, resulting in a single accuracy value per parcel. SelectKbest (200 features for iEEG) feature selection and fivefold cross-validation with three repetitions was used. To assess the statistical significance of the decoding accuracy within unique ROIs (so only one accuracy score is obtained per ROI), P values obtained via permutation tests were corrected for multiple comparisons across all ROIs using FDR correction (q ≤ 0.0575). To compute Bayes factors on the decoding accuracy values, we used a β-binomial approach that compares the marginal likelihood under a point-null hypothesis against a flat \(B(\alpha =1,\,\beta =1)\) alternative prior, yielding an analytic Bayes factor. We then derived the null hypothesis parameters from the empirical null distribution by updating a tight prior centred at chance level (\(B(\alpha =1,000,\,\beta =1,000)\)) with the shuffle-based accuracies, thereby incorporating any bias present in the null distribution.
MEG decoding was done on bandpass-filtered (1–40 Hz) and downsampled (100 Hz) data. The reconstructed source-level MEG data within a subset of the predefined anatomical ROIs (GNWT: ‘G_and_S_cingul-Ant’, ‘G_and_S_cingul-Mid-Ant’, ‘G_and_S_cingul-Mid-Post’, ‘G_front_middle’, ‘S_front_inf’, ‘S_front_sup’; IIT: ‘G_cuneus’, ‘G_oc-temp_lat-fusifor’, ‘G_oc-temp_med-Lingual’, ‘Pole_occipital’, ‘S_calcarine’, ‘S_oc_sup_and_transversal’, as they showed high response to the stimulus on the optimization dataset) were extracted for further analysis (500 vertices and 800 vertices per hemisphere for each of the anatomical ROI defined by the theories). We applied temporal smoothing (0.05-s window, 0.01-s sliding window), computed pseudotrials107, normalized the data and selected the top 30 features within a given ROI as features for the different classifiers. A group-level one-sample t-test per time point was performed on the decoding accuracy results, corrected for multiple comparisons using a cluster-based correction106.
The overall decoding strategy for fMRI was similar to that used on the iEEG and MEG data, yet with some differences. A multivariate pattern analysis approach was used on the pattern of BOLD activity over voxels. A non-spatially smoothed parameter estimate map was obtained by fitting a GLM per event with that event as the regressor of interest and all the other remaining events as one regressor of no interest108 as implemented in NiBetaSeries (0.6.0) package. The model also included the 24 nuisance regressors described in the ‘fMRI preprocessing’ section.
Decoding was performed using whole-brain and ROI-based approaches. The whole-brain analysis was performed using a searchlight approach with 4-mm radius. For ROI-based decoding, decoding ROIs were defined based on functional fMRI contrasts (see the ‘fMRI preprocessing’ section) and constrained with pre-defined anatomical ROIs (see Extended Data Table 2 on anatomical ROIs). A one-sample permutation test was used to determine whether decoding significantly exceeded chance level within each ROI. FDR was used to correct for multiple comparisons across ROIs. For whole-brain decoding, a cluster-based permutation test was used to evaluate the decoding statistical significance across participants (P < 0.05), complemented by Bayesian analysis. In addition, stimulus versus baseline searchlight decoding was performed using leave-one-run out cross-validation, and the resultant decoding accuracy maps were used as input for the multivariate putative NCC analysis (see below). To perform stimulus versus baseline decoding, we subsampled the stimuli trials to a 2:1 ratio with respect to baseline. The SVM cost function was weighted by the number of trials from each class. Plots were generated using Matplotlib (3.3.2)109.
Decoding schemes for the different predictions
To test GNWT and IIT decoding predictions, stimulus category (faces versus objects and letters versus false fonts) was decoded separately for the task-relevant and task-irrelevant conditions (within-task category decoding), whereas orientation (front view versus left view versus right view) was decoded on the combined data from the two task conditions. In addition, cross-task category decoding from the task-relevant to task-irrelevant condition and vice versa was performed to test generalization by training classifiers on one condition and testing on the other condition. Both within-task category and orientation decoding were performed in a leave-one-run-out cross-validation scheme for fMRI and in an k-fold cross-validation scheme for MEG and iEEG.
For category decoding, trials from each task condition (that is, task relevant or task irrelevant) were extracted for each category comparison of interest: 160 face/160 objects classification, 160 letters/160 false-fonts classification within each task-relevant condition for MEG, and half the trials for iEEG. For fMRI, there were 64 trials for each category in each task-relevant condition. For orientation decoding, task-relevant and task-irrelevant trials were collapsed within category to increase the signal-to-noise ratio, resulting in 160 front, 80 left and 80 right trials per category for MEG, and half these numbers for iEEG. For fMRI, there were 64 front and 32 left and right trials per category. Decoding was evaluated using accuracy measures, tested against 50% chance level for category decoding (binary classification) and against 33% chance level for orientation decoding (three-class classification). For orientation decoding, balanced accuracy was used due to the unbalanced number of trials for the different orientations. The SVM cost function was weighted by the number of trials per class to reduce bias to the class with the highest number.
$${\rm{B}}{\rm{a}}{\rm{l}}{\rm{a}}{\rm{n}}{\rm{c}}{\rm{e}}{\rm{d}}\,{\rm{a}}{\rm{c}}{\rm{c}}{\rm{u}}{\rm{r}}{\rm{a}}{\rm{c}}{\rm{y}}=\frac{1}{3}({{\rm{S}}{\rm{e}}{\rm{n}}{\rm{s}}{\rm{i}}{\rm{t}}{\rm{i}}{\rm{v}}{\rm{i}}{\rm{t}}{\rm{y}}}_{{\rm{f}}{\rm{r}}{\rm{o}}{\rm{n}}{\rm{t}}}+{{\rm{S}}{\rm{e}}{\rm{n}}{\rm{s}}{\rm{i}}{\rm{t}}{\rm{i}}{\rm{v}}{\rm{i}}{\rm{t}}{\rm{y}}}_{{\rm{r}}{\rm{i}}{\rm{g}}{\rm{h}}{\rm{t}}}+{{\rm{S}}{\rm{e}}{\rm{n}}{\rm{s}}{\rm{i}}{\rm{t}}{\rm{i}}{\rm{v}}{\rm{i}}{\rm{t}}{\rm{y}}}_{{\rm{l}}{\rm{e}}{\rm{f}}{\rm{t}}})$$
For within-task decoding (for example, classification of categories across time), a classifier at each time point was trained and tested separately using a fivefold cross-validation (with three separate repeats of cross-validation). For cross-task decoding (task relevant → irrelevant and task irrelevant → relevant), each SVM model was trained on one task (for example, faces–objects in the task-relevant condition) and tested on the second task (for example, faces–objects in the task-irrelevant condition). As cross-decoding in iEEG data is performed across all pooled electrodes, an additional cross-validation step was performed on this modality data to provide a confidence metric (for example, confidence intervals) using a fivefold cross-validation with three repetitions (for example, train on 80% of task 1, and test on held-out 20% of task 2).
Within-task temporal generalization was performed by training a classifier at each time point (using selectKbest feature selection) and testing its performance across all time points using the same set of selected features and three repetitions of fivefold cross-validation. To generalize from one task to another across all time points, cross-temporal generalization was used: a classifier was trained at each time point in task 1 (for example, task relevant) using selectKbest feature selection, and tested across all time points in task 2 (for example, task irrelevant) using the same set of selected features. Cross-validation was performed in the same manner as in cross-decoding.
Additional decoding analyses were performed on all trials aligned to the stimulus onset (for example, −0.2 to 2 s relative to stimulus onset) and stimulus offset (−0.5 to 0.5 s around stimulus offset). For the latter analysis, all trials from different durations were aligned to the stimulus offset.
To assess the prediction of IIT that included prefrontal regions along with posterior regions to the decoding of categories will not significantly affect decoding accuracy, we performed an additional decoding analysis in which the decoding performance of electrodes from the IIT region were compared with the decoding performance when electrodes from both the posterior + PFC ROIs are included. The PFC ROIs included all PFC ROIs except for inferior frontal sulcus, as it belongs to the IIT extended ROIs. Posterior ROI included all IIT ROIs shown in Supplementary Table 26. The analysis compared the decoding accuracy for a model including all electrodes from posterior regions to a separate model in which electrodes (features) from posterior and PFC regions were combined (for example, feature combination). Training and testing of the individual models followed all previously described cross-validation procedures, and model comparison was performed using a variance-corrected paired t-test110 and complemented with Bayesian analysis.
We also tested this prediction on the fMRI data. To select features to be used for both analyses, the face versus object contrast for each participant was masked by a predefined anatomical posterior ROIs as well as PFC anatomical ROIs, defined the same way as described above. Within each of the two ROIs, the 150 voxels that are most selective to each of the to-be-decoded stimuli were defined as the decoding ROIs (300 voxels total) for each participant. The first analysis compared the decoding accuracies for a model that included 300 voxels from the posterior ROIs as features to another model that included 600 voxels (300 features from each ROI). In the second analysis, two separate models were constructed, calibrated and combined as described above. For the two analyses, model comparison was performed using a group-level one-sample permutation test to determine if accuracies obtained by combining posterior and PFC ROIs were significantly higher than the accuracies obtained based on posterior ROIs only. FDR was used to correct for multiple comparisons. Bayesian analysis was performed to quantify evidence for the null hypothesis that adding prefrontal ROIs will not improve decoding accuracy.
Duration analysis
Neural responses were extracted from three windows of interest (0.8–1.0 s, 1.3–1.5 s and 1.8–2.0 s) and compared using LMMs. Four theory agnostic models were fitted: a null model, a duration model (three durations), a windows of interest model, and a duration and windows of interest model. Two theory models were fitted: the GNWT model predicts activation (ignition) following stimulus offset (0.3–0.5 s) independent of duration, with virtually no response in between. The IIT model predicts sustained activation for the duration of the stimulus returning to baseline after stimulus offset. Both theoretical models were complemented with an interaction term between category (faces, objects, letters and false fonts) and the theories’ predictors, to account for regions showing selective responses to categories. BIC was used to define the winning model and we computed Bayes factors based on the difference in BIC values, comparing the GNWT model (with or without interaction) against either the null model (intercept only) or the time-window model (capturing amplitude changes over time)111.
Models for iEEG were fitted per electrode on the predefined ROIs, using the high-gamma (AUC), alpha (8–13 Hz, obtained through Morlet wavelets, f = 8–13 Hz, in 1-Hz steps; f/2 cycles, AUC),and ERPs (peak to peak) as signal, separately for task-relevant and task-irrelevant condition.
MEG models were fitted to source data on the predefined ROIs, using the gamma (60–90 Hz) and alpha (8–13 Hz) bands as signal, separately for task-relevant and task-irrelevant conditions. Time-frequency analyses were performed on source-data using Morlet wavelets (f = 8–13 Hz, in 1-Hz steps; f/2 cycles; f = 60–90 Hz, in 2-Hz steps, f/4 cycles) and were baseline corrected. Spectral activity was computed for each vertex, baseline corrected and then averaged across trials within each parcel included in the ROIs, yielding a unique time course per ROI parcel. In addition, a single-source time course capturing the entire prefrontal ROI and the posterior ROI was computed by averaging the spectral activity within an ROI. Models were fitted on each parcel and ROI, as defined by the theories.
Representational similarity analysis
To examine how the neural representations evolved over time in response to the different stimulus properties (that is, category, orientation and identity representation), we performed cross-temporal RSA on source-level MEG data and iEEG high-gamma power within each of the theory-defined ROIs, using all trials. Specifically, at each set of data points, we computed a representational dissimilarity matrix (RDM) by calculating the correlation distance (1 − Pearson’s r, Fisher corrected) between all pairs of stimuli (the preregistration document described a different method that was however updated to optimize trial numbers; see section 14 in Supplementary Information for justification). Next, to quantify the representational space occupied by one class versus another, we computed the average within-class distances versus the average between-class distances. This analysis was performed in a cross-temporal manner, in which RDMs were computed between all stimuli at time point t1 and the corresponding set of stimuli at time points t1, t2,…tn.
Long trials (1.5 s) were used to investigate category and orientation representation. As specific identities were repeated a limited number of times per duration, both intermediate (1.0 s) and long (1.5 s) trials were combined and equated in duration by cropping the 1–1.5-s time interval for long trials. This was done to allow for the analysis of at least three (3) presentations of the same identity.
To evaluate the theoretical predictions about when significant content representation should occur, we subsampled the observed cross-temporal representational matrices in four time windows (0.3–0.5 s, 0.8–1.0 s, 1.3–1.5 s and 1.8–2.0 s). The subsampled matrices were correlated to the model matrices predicted by GNWT and IIT (see Fig. 1a, right panel) using Kendall’s tau correlation. If the correlation was significant (see below) for at least one of the predicted matrices, we computed the difference between the transformed correlation (\((r+1)/2\)) to each theory, and compared this difference against a random distribution to obtain a P value. If the correlation with the theory-predicted pattern in the theory ROI was significantly higher than the other model, we considered the theory prediction to be fulfilled.
To generate a null distribution of cross-temporal RSA surrogate matrices, we repeated the procedure outlined above 1,024 times, randomly shuffling the labels. Next, the observed RSA matrix was z-scored using the null distribution as:
$${z}_{i,j}=\frac{{{\rm{obs}}}_{i,j}-{\mu }_{{{\rm{surr}}}_{i,j}}}{{\sigma }_{{{\rm{surr}}}_{i,j}}}$$
Where \({{\rm{obs}}}_{i,j}\) is the observed within-versus-between class difference at time points i and j, and \({\mu }_{{{\rm{surr}}}_{i,j}}\) and \({\sigma }_{{{\rm{surr}}}_{i,j}}\) are the mean and standard deviation of the surrogate representational similarity matrix at time points i and j, respectively. Cluster-based permutation tests112, z-score threshold of z = 1.5 for clustering, were used to evaluate significance. RSA surrogates were also used to assess the significance of the correlation between the observed matrices and the predicted matrices of the theories. First, a null distribution of possible correlations was generated for each of the theories by correlating each of the surrogate matrices to each of the theory-predicted matrices. Next, a P value was obtained for each theory-predicted matrix, by locating its observed correlation within the null correlation distribution. The same procedure was used to assess the significance of the difference in correlation to IIT and GNWT matrices (for example, each of the surrogate matrices was correlated to each of the theory-predicted matrices and the difference between the two was computed). P values were FDR corrected (q ≤ 0.05)75.
For iEEG, the high-gamma power per electrode within the predefined anatomical ROI was averaged in 0.02-s non-overlapping windows. Electrodes were used as features for the RDM. The data were vectorized across all electrodes within a ROI (for example, samples × significant electrodes) to compute the RDMs. A total of 576 and 583 electrodes entered this analysis for the prefrontal and posterior ROI, respectively. The resultant RDM was subjected to a PCA, and the first two dimensions were plotted against each other to produce a 2D projection of dissimilarity scores across all pairs for each of the 100 subsampling repetitions. The PCA components were aligned across repetitions using Procrustes alignment and averaged together for visualization purposes113,114.
For MEG, the same analysis was run on the source reconstructed data within the predefined anatomical ROIs used for the decoding analysis, bandpass filtered (1–40 Hz) and downsampled (100 Hz). For the category and orientation analysis, pseudotrials and temporal moving-average methods were used to optimize the RSA analysis and improve the signal-to-noise ratio. For identity, single trials were used. Vertices within the ROIs were used as features. The statistical testing differed from that conducted on the iEEG data, as it was performed at the participant level. Similarly to the iEEG analysis, we first tested whether the correlation between the data and the model predicted by each theory was greater than zero using the Kendall’s tau measure, and then compared between the theories using the Mann–Whitney U rank test on two independent samples.
Functional connectivity analysis
For both iEEG and MEG, PPC46 was computed between each category-selective time series (face selective and object selective) and either the V1/V2 or the PFC time series.
For iEEG, the PPC analysis included electrodes in V1/V2 visual areas, in PFC ROIs (see Supplementary Table 26), and face-selective and object-selective electrodes (see ‘Identification of task-responsive channels’), as long as they were ‘active’ during the task. As both theories predict different types of activation (for example, ignition versus sustained activation), channels were categorized as active if they showed an increase in high-gamma power relative to baseline (−0.5 to −0.3 s, P < 0.05, signed-rank test) evaluated across all trials (task relevant + irrelevant, intermediate + long trials, combined across both categories), for the 0.3–0.5-s window (GNWT), or in all time windows: 0.3–0.5 s, 0.5–0.8 s and 1.3–1.5 s (IIT).
For MEG, the category-selective single-trial time courses used to define the ROIs for PPC analysis were extracted using the generalized eigenvalue decomposition (GED) method115. Two GED spatial filters were built by contrasting either faces or objects against all other categories during the first 0.5 s after stimulus onset. Single-trial covariance matrices were computed separately for signal and reference for all vertices within the fusiform ROI identified from the FreeSurfer parcellation using the Desikan atlas116, and the Euclidean distance between them was z-scored. Trials exceeding 3 z-scores were excluded. The reference covariance matrix was regularized to reduce overfitting and increase numerical stability. The GED was then performed on the two covariance matrices, resulting in n (=rank of the data) pairs of eigenvectors and eigenvalues. The eigenvector associated with the highest eigenvalue was selected as a GED spatial filter, which in turn was applied to the data to compute the single-trial GED component time series. A GED spatial filter was extracted also for the PFC ROI, on parcels from the Destrieux atlas73, to identify the distributed pattern of sources that are responsive to visually presented stimuli. Specifically, a spatial filter was built by contrasting source-level frontal slow-frequency activity (30-Hz low-pass filter) after stimulus onset (0–0.5 s) against baseline (−0.5 to 0 s). V1/V2 areas were identified using the Wang Atlas74 and a singular values-decomposition approach. For the GED, the 1.0-s and 1.5-s duration trials were used to minimize overlap with the transient evoked at stimulus onset.
PPC was computed for each MEG time series–iEEG electrode pairing, for all face trials and object trials separately. Analyses were performed on 1.0-s and 1.5-s duration trials, separately on task-relevant and task-irrelevant trials and also combined to maximize statistical power. To compute synchrony, time-frequency analysis of the broadband MEG and LFP signal was performed using Morlet wavelets (f = 2–30 Hz, in 1-Hz steps; 4 cycles; f = 30–180 Hz for iEEG or f = 30–100 Hz for MEG, in 2-Hz steps, f/4 cycles), and PPC was then computed by taking the difference in phase angle between MEG time series–iEEG electrode at each time t and frequency f for a specific trial and computing PPC across all trials in a category (for example, faces) as:
$$\text{PPC}(\,f,t)=\frac{2}{N(N-1)}\mathop{\sum }\limits_{j=1}^{N-1}\mathop{\sum }\limits_{k=j+1}^{N}\cos ({\theta }_{j}(f,t)-{\theta }_{k}(f,t)),j=\{1,\ldots ,N\,\text{trials}\}$$
\({\theta }_{j,k}(f,t)={\theta (f,t)}_{e1{\rm{or}}{\rm{GED}}{\rm{filter}}}-{\theta (f,t)}_{e2{\rm{or}}{\rm{GED}}{\rm{filter}}}\), for all frequencies f and at all times t.
For iEEG, PPC for each category-selective site was then averaged across all its pairings (for example, all PFC electrodes pairings or all V1/V2 pairings within that patient). The variability in electrode coverage across patients precluded a within-participants analysis. Therefore, to achieve sufficient statistical power, we pooled all derived PPC values from one electrode pairing (for example, face selective to the PFC) across all patients into one ROI-specific analysis. A similar approach was used on the MEG parcels.
To quantify content-specific synchrony enhancement, the difference in PPC was computed between within-category and across-category trials (for example, for face-selective sites, the change in PPC was computed between faces versus objects trials) using a cluster-based permutation test106. This was done for both modalities.
As an exploratory analysis, we also investigated dynamic functional connectivity using the Gaussian copula mutual information117 approach to evaluate the dependencies between time series. This power-based measure of connectivity was implemented using the conn_dfc method from the Frites Python package118. We used the same parameters as for the PPC analysis, with the following exceptions: for both MEG and iEEG, power was estimated through a multitaper-based method (using a frequency-dependent dynamic sliding window: 2–30 Hz, T = 4 cycles; 30–100 Hz, T4/f using a 0.25-s sliding window). For iEEG, the high-frequency range was extended from 30 to 180 Hz, T = 4/f cycles). DFC was performed per frequency band, 0.1-s sliding window and 0.02-s steps.
For fMRI, connectivity was assessed through gPPI implemented in SPM119. The FFA and lateral occipital cortex were defined as seed regions per participant based on an anatomically constrained functional contrast. Anatomically, FFA seeds were constrained to the ‘inferior occipital gyrus (O3) and sulcus’ and ‘lateral occipito-temporal gyrus (fusiform gyrus; O4–T4)’. LOC seeds were constrained to the ‘middle occipital gyrus (O2; lateral occipital gyrus)’ and the ‘middle occipital sulcus and lunatus sulcus’ (Destrieux ROIs 2 and 21 for FFA, and ROIs 19 and 57 for LOC; see ‘Anatomical ROIs’).
Candidate seed voxels within the above-mentioned anatomical ROIs were defined as those with z > 1 in the contrast of parameter estimates of all stimuli versus baseline. Three participants with less than 300 candidate seed voxels were excluded from the analysis. This was done to ensure that the seed voxels were visually driven. Next, using an unthresholded contrast of parameter estimates between ‘relevant and irrelevant faces’ and ‘relevant and irrelevant objects’, the 300 voxels most responsive to faces within the FFA anatomical ROIs were selected for the FFA seed, and the 300 voxels most responsive to objects within the LOC anatomical ROIs were selected for the LOC seed.
gPPI analysis was performed per participant and seed region separately, including an interaction term between the seed time-series regressor (physiological term) and the task regressor (psychological term) at the participant-level GLM119, separately for task-relevant and irrelevant conditions, and also combining across tasks to increase statistical power. For combined conditions, the model design matrix for each participant included regressors for task-relevant and task-irrelevant faces, objects, letters and false fonts collapsed across conditions (four regressors) as well as a regressor for targets (irrespective of their category), yielding five regressors in total. As for separated conditions, the model design matrix included regressors for task-relevant and task-irrelevant faces, objects, letters and false fonts (eight regressors) as well as a regressor for targets (irrespective of their category), yielding nine regressors in total. For each seed, group-level analysis was performed using a cluster-based permutation test (preferred over the preregistered FDR correction), complemented by Bayesian analysis. See section 14 in Supplementary Information for a justification of this change to evaluate the statistical significance of face > object contrast parameter estimates across participants (P < 0.05).
Putative NCC analyses
A series of conjunction analyses were performed on the fMRI data to identify (1) areas responsive to task goal, (2) areas responsive to task relevance, and (3) areas putatively involved in the neural correlates of consciousness. We note that the contrasts proposed below might overestimate the neural correlates of consciousness and that the fast-event-related design adopted here might be suboptimal to detect activity changes in the salience network120, that is, potentially underestimating some regions that might be involved in conscious processing. We therefore have adopted a conservative approach that distinguishes between areas that might participate in consciousness versus those that definitely do not.
The conjunction defining areas responsive to task goals was defined as [TaskRelTar > bsl] and [(TaskRelNonTar = bsl) and (TaskIrrel = bsl)]. This contrast captures areas that show an increase of BOLD signal for targets but not for other stimuli. The following conjunction identified areas responsive to task relevance: [(TaskRelTar > bsl) and (TaskRelNonTar ≠ bsl)] and [TaskIrrel = bsl]. This contrast identifies areas displaying differential activity for all task-relevant stimuli, but are insensitive to non-task-relevant stimuli. Finally, the following conjunction was used to identify the putative NCC areas: [(TaskRelNonTar (stim id) > bsl) and (TaskIrrel (stim id) > bsl)] or [(TaskRelNonTar (stim id) < bsl) and (TaskIrrel (stim id) < bsl)], critically detecting areas that are responsive to any stimulus category irrespective of task, with consistent activation or deactivation. Thus, this analysis casts a wide net to identify areas that can potentially be the neural correlates of consciousness, whereas excluding areas that do not respond to task-relevant or irrelevant stimuli (meaning that areas that respond both to the task and to the content of perception are still included).
To compute conjunctions, we first ran a GLM (see above) corrected for multiple comparisons (Gaussian random-field cluster-based inference). Equivalence to baseline was established using a JZS Bayes factor test, with a Cauchy prior (r scale value of 0.707, as implemented in Pingouin (0.5.1)121. Evidence maps were thresholded at BF01 > 3. The thresholded z maps and the Bayesian evidence maps on the group level were used for the conjunction analysis. For conjunctions including an ‘unequal to’, a ‘logical and’ operation was used between the directional z maps, after thresholded maps were binarized. For the putative NCC contrast, conjunctions were performed separately for activations and deactivations, using a ‘logical and’ operator for the task-relevant and irrelevant z maps. The resulting maps were combined using a ‘logical or’ operation to discard areas showing effects of opposite direction for task-relevant and task-irrelevant stimuli. This analysis was also done at the participant level, masked using the anatomical ROIs, to account for inter-participant variability. For each ROI, the proportion of participants with voxels included in the conjunction is reported. The multivariate version of the putative NCC analysis was done using the thresholded statistical maps obtained from the whole-brain searchlight decoding based on a participant-level stimulus versus baseline-decoding accuracy maps (for details regarding the decoding approach used, see ‘Decoding analysis’).
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.