Thursday, June 25, 2026
No menu items!
HomeNatureDisparate privacy risks from medical AI

Disparate privacy risks from medical AI

Medical artificial intelligence (AI) has immense potential to improve health outcomes, particularly in regions in which specialized medical expertise is scarce1. At the same time, AI also poses new challenges and risks, including security vulnerabilities that arise when models are deployed. Untrusted users with access to an AI model may, by merely observing its predictions, steal its parameters8,9 or perform privacy attacks2,3,4,5,6,7, which can extract sensitive details about the data used for model training.

Privacy attacks against an AI model can enable detailed inferences about the individuals who contributed to its training data. For example, a membership inference attack (MIA)2 attempts to determine whether the data of a specific patient were included in the training dataset of a model. The extent to which this constitutes a privacy violation is nuanced and depends on factors such as the underlying training population and the deployment context of the model. Although inferring membership for a model trained on a general population may be benign, doing so for a model trained on a narrow, disease- or centre-specific cohort acts as a direct proxy for sensitive medical information. For example, a successful MIA against the model in ref. 10, which predicts anti-cancer immunotherapy efficacy from routine blood test data, reveals that an individual has cancer.

The accelerating deployment of medical AI models trained on sensitive patient data11 calls for rigorous privacy risk assessments. However, previous studies primarily quantified the success rate of MIAs, in aggregate, across all records in a training dataset. This implicitly averages risk across records, thereby obscuring important information on record- and patient-level attack success. Consequently, the risk that an individual faces by contributing their personal data (often multiple records) to an AI training dataset is poorly understood. Given that medical data are a key target for cybercriminals12,13, and pseudonymization alone is increasingly recognized as insufficient to prevent the re-identification of individuals in large, high-dimensional datasets14,15,16, there is a need to improve our understanding of the threat that AI privacy attacks pose to individual patients.

Here we show that deploying medical AI models without protective measures can pose substantial privacy risks to individual data-contributing patients. These risks are particularly acute when membership in a training population itself reveals sensitive medical information. Our privacy audit of AI models trained to perform standard diagnostic (supervised classification) tasks quantifies state-of-the-art MIA success3,4 at the resolution of individual data contributors. Using seven large datasets comprising real-world clinical data, including various types of medical images, electrocardiograms and electronic health records, we demonstrate that the success of a MIA is unequally distributed among data-contributing patients. We show that this disparity exists at two levels: (1) the individual patient level, at which some patients experience near-perfect attack success, whereas others remain essentially unaffected; and (2) the group level, at which patient groups underrepresented in a training dataset are often overrepresented among records most vulnerable to MIAs.

Together, our results indicate that privacy attacks against AI models may be much more effective at compromising the privacy of individual data contributors than previously thought. This suggests that current AI privacy risk reporting practices may underestimate individual-level risk and thus motivates the integration of mathematically verifiable risk mitigation strategies such as differential privacy (DP) into medical AI model development workflows.

Attacking AI by simple hypothesis tests

A popular deployment strategy for AI models gives users access to a model through a prediction interface, which, for a given input (for example, the chest radiograph of a patient), returns a corresponding prediction (for example, a 78% chance of pneumonia). This black-box access to a model can be exploited by an untrusted user to conduct a MIA that shows the membership status of a target record, that is, whether the target record was a member of the training dataset of a model or not (Fig. 1a). To infer membership status, MIAs typically make use of the fact that AI models are often slightly more confident about their predictions on training than on non-training data.

Fig. 1: MIA and evaluation strategies.
Fig. 1: MIA and evaluation strategies.

a, Schematic of a MIA, in which an untrusted user, only by observing the predictions of a model, aims to infer whether a specific target record was part of the training dataset. The attack is considered successful if the untrusted user can reliably distinguish between model A and model B, which are identical except for the inclusion and exclusion of the target record in the respective training dataset. b,c, Attack success can be measured either, in aggregate, across all records in the dataset (b) or, more granularly, for each record individually across many target models (c).

Likelihood-ratio MIAs (LR-MIAs)3,4, the current state-of-the-art in MIAs, frame membership inference as a simple vs. simple hypothesis testing problem on the prediction confidence provided by the target model. In essence, LR-MIAs compare the likelihood of the predicted confidence of the target model for the target record under the null (the target record was not a member) and the alternative hypothesis (the target record was a member). Here, the parameters of the distributions under the two hypotheses are specified by parametric fitting of sample confidence values obtained from reference models. Reference models are models assumed to be trained by the attacker and are ideally, but not necessarily, of similar architecture as the target model and trained on data similar to the training dataset of the target model.

Note that objectively larger threats are posed by privacy attacks with stronger assumptions on a potential attacker, such as access to model parameters17, access to parameter updates during model training18 or, furthermore, the ability to modify the model architecture19,20. However, we do not consider them in this study as their strong assumptions are not realistic for careful, practical deployment scenarios. By contrast, the type of attack we consider here requires querying the target model only once (to obtain a prediction for the target record) and may thus be executed by any attacker posing as a real user of an AI system. Notably, as the attacks we study are executed against fully trained models, data-governance-preserving techniques such as federated/swarm-learning21 provide no protection.

From aggregate to patient-level risk

MIA performance is evaluated through a receiver operating characteristic (ROC) analysis22 on numerous repetitions of the MIA game scenario, in which an untrusted user is challenged to guess the membership status of a given record (Fig. 1a). In practice, owing to the computational cost of training AI models, attack success is typically evaluated using a single target model. More specifically, a target model is trained on a random subset of the training dataset, and subsequently, an ROC analysis is performed on the aggregated membership predictions for all records in the dataset (Fig. 1b). Although practical, this approach has a key shortcoming: it provides no indication of the performance of the attack for individual records or patients.

To address this issue, we propose a simple technique for estimating record-, and by extension, patient-level vulnerability to LR-MIAs (Fig. 1c). In brief, using a large set of target models (N = 200) trained on random patient subsets, we estimate, for each training record, sampling distributions of the confidence of the target model under the null and alternative hypotheses in LR-MIAs. In other words, we estimate empirical distributions of confidence values as provided by target models, partitioned into models trained and not trained on the target record. Because these distributions are assumed to take Gaussian form in LR-MIAs3,4, record-level attack success, as measured by the area under the ROC curve (AUC), can be calculated in closed form (Methods). A high AUC score, close to the maximum value of 1.0, suggests high privacy risk: a MIA for this record could achieve high sensitivity with little to no false positives. Notably, the record-level MIA AUC also offers a probabilistic interpretation22: the record-level MIA AUC is the probability that a confidence score from a target model trained on the target record is larger than a score from a target model not trained on the target record.

Correctly determining the membership status for one of the records contributed by an individual patient reveals the membership status of the patient. Thus, we compute patient-level scores by taking the maximum across all record-level scores for a given patient. The raw record-level scores and the average patient-level scores can be found in Extended Data Fig. 1.

Notably, our technique for measuring record-level attack success reduces to estimating the bi-normal AUC from sample statistics and thus has desirable statistical properties. Its standard error at the record level can be computed in closed form23 (Methods). As expected, using a total of N = 200 target models evenly split between null and alternative hypotheses for each record, the standard error of the record-level MIA AUC is small across all records in the investigated datasets (Extended Data Fig. 2).

Attacking open-source models

Recent advances in attack design have made LR-MIAs much more practical. To illustrate the practical feasibility of conducting MIAs, we demonstrate attacks against two chest radiograph models from the TorchXrayVision24 library. We used the Robust Membership Inference Attack4 (RMIA), an improved LR-MIA that requires only one or two reference models, compared with more than 100 for the Likelihood Ratio Attack3 (LiRA). RMIA achieves this efficiency gain by effectively using reference data (data similar to the target record) alongside the target record to query the target model. Crucially, the attack does not require knowledge of the membership status of the reference data.

We simulated a realistic attack setting in which an attacker lacks access to the training dataset of the target model to train reference models and is further constrained by computational resources. Specifically, we used only a single pre-trained PadChest25 model as a reference model to perform attacks against the CheXpert26 and MIMIC-CXR27 models of the library. In this setting, also known as an offline attack, an attacker incurs no computational cost in training the reference model. Instead, they simply need to obtain predictions from the reference model for both the target record and the reference data. This can be done efficiently on commodity hardware without a graphics processing unit (GPU). To conduct the attack, we queried the target model once to collect confidence values for all target records. Using this collection, we then computed RMIA test statistics for each target record by randomly selecting, independent of membership status, N = 500 confidence values from the other targets in this collection as reference data. This strategy would effectively conceal the additional reference data queries to the target model in a real attack.

We evaluated attack success on a combined dataset of records from CheXpert and MIMIC-CXR (N = 25, 000 each), which were, respectively, labelled as members and non-members for the CheXpert model (v.v. for the MIMIC-CXR model). In this setting, RMIA achieved substantial aggregate success with respective AUC scores of 0.61 and 0.65 (Fig. 2a). Note that owing to the distribution shift between members and non-members, results from this evaluation setting are not directly comparable to the standard evaluation protocol in which members and non-members are sampled at random from the training dataset. Notably, however, such a distribution shift is expected in a real attack, and this setting is thus of high interest.

Fig. 2: MIAs pose substantial privacy risks to individual data-contributing patients.
Fig. 2: MIAs pose substantial privacy risks to individual data-contributing patients.

a, Aggregate MIA success for a realistic attack against open-source CheXpert and MIMIC-CXR models (RMIA offline, R = 1 reference model pre-trained on PadChest). b, eSF analysis of patient-level MIA AUC scores computed using N = 200 target models for each dataset (residual networks, about 1.5 million parameters each). Patient-level scores are computed as the maximum record-level score for a given patient. c, ROC analysis of aggregate attack success (LiRA online, vertical-average mean for N = 10 target models, R = 190 reference models). d,e, eSF plots of patient-level MIA AUC scores alongside diagnostic performance (macro-average AUC) on unseen test data for N = 200 target models with varying levels of record-level (εδ)-DP privacy protection: PTB-XL (d) and EMBED (e). δ was kept constant at 1/D, where D is the dataset size. f,g, eSF plots of patient-level MIA AUC scores alongside diagnostic performance (macro-average AUC) on unseen test data for N = 200 target models of increasing model capacity: Fitzpatrick-17k (f) and CheXpert (g). Error bars indicate s.d., round brackets indicate aggregate attack AUC; square brackets indicate AUC upper bound implied by record-level DP accounting; dashed grey lines in ROC curve plots indicate random-guessing performance; and dashed lines in eSF plots indicate 95% Greenwood CI.

Near-perfect success for some patients

After demonstrating realistic attacks against two open-source models, we next investigated how effectively MIAs can compromise the privacy of individual patients. To this end, we measured patient-level MIA success across a diverse range of medical datasets using, for each, a large set of target models. Notably, we used state-of-the-art model training techniques (for example, data augmentation, weight decay and learning rate schedules) and furthermore, took explicit countermeasures to prevent overfitting, which is known to exacerbate privacy risks2,28. As a result, the investigated target models, despite being trained on roughly half of the available data each, provide high diagnostic performance within a few percentage points of published baselines (Methods).

Across all investigated datasets and models, we identified a small subset of patients who are highly vulnerable to LR-MIAs. This is indicated by empirical survival functions (eSF) of patient-level MIA AUC scores, which, for a given score, show the proportion of patients with this score or higher (Fig. 2b). By contrast, ROC curves of aggregate attack success and their corresponding AUC scores do not deviate substantially from the random-guessing baseline, thus incorrectly indicating a low attack vulnerability (Fig. 2c and Extended Data Fig. 1c–e). This suggests that average-case metrics of attack success, as used in the standard evaluation protocol, are unsuitable measures of privacy risk. They do not accurately reflect that some records or patients may be highly vulnerable, whereas the vast majority are not.

For the two non-imaging datasets, MIMIC-IV-ED29 (electronic health records) and PTB-XL30 (electrocardiograms), we simulated attack settings in which an attacker only has partial access to the target record (Extended Data Fig. 3). Although MIA success generally decreases under partial data access, a subset of patients retain high AUC scores, even in settings in which the attacker has access to only basic clinical information—such as a patients’ age, sex, chief complaints and vital signs (MIMIC-IV-ED), or only the lead I signal from a 12-lead electrocardiogram (PTB-XL).

We verified how resolvable the discovered vulnerabilities are by training models with different levels of record-level (εδ)-DP protection (Fig. 2d,e). As expected, we find that patient-level MIA risk decreases with stronger levels of privacy protection (smaller ε values). Moreover, in most scenarios, we observe no violation of the record-level DP guarantee (indicated by the square brackets in the panel legend), although many patients contributed multiple records. Violations are observed only for a subset of patients under strong privacy protection (ε = 1), in which some patients have MIA AUC scores exceeding the upper bound on the MIA AUC implied by the record-level DP guarantee. This behaviour is expected and could be mitigated by implementing patient-level DP accounting.

Larger models, greater risks

Many of the recent AI success stories have been driven not by methodological advances but by scaling up model and dataset sizes31. In light of this scaling trend, we next investigated the impact of model capacity on MIA success. For Fitzpatrick 17k32 and CheXpert, we trained models with increasing capacity, including wide residual networks33 (WRN-28-2 and WRN-40-4) and vision transformers34 (ViT-B/16 and ViT-L/16). Where computationally feasible, vision transformers were trained on images of different sizes: 64 × 64 and 128 × 128 pixels; this is indicated by a trailing number behind the model name (for example, ViT-B/16-64 and ViT-B/16-128).

We find that MIA success (both at the aggregate and patient levels) increases with model capacity. We observe that the relative share of patients highly vulnerable to MIAs increases greatly for larger models, often by an order of magnitude (Fig. 2f,g). For the dermatology dataset (Fitzpatrick 17k), increasing model capacity yields large gains in diagnostic performance with a pronounced increase between WRN-40-4 and ViT-B/16-128, which was pre-trained on a large dataset of more than 14 million natural images. However, simultaneously, the number of patients with near-perfect attack success (AUC score of 0.95 or higher) increases substantially: 0 (WRN-28-2), 1 out of 10,000 (WRN-40-4), 1 out of 1,000 (ViT-B/16-64) and 1 out of 10 (ViT-B/16-128). We observe a similar trend in the much larger dataset CheXpert, although attack success is generally lower. Notably, for CheXpert, vision transformer models do not achieve diagnostic performance competitive with WRN-based models. This is probably because of the diminished utility of natural-image pre-training for medical greyscale images35.

Attack success varies by subgroup

Motivated by recent findings36,37, which revealed that the diagnostic performance of AI models can differ across patient subgroups, we investigated whether differences in privacy risk exist between subgroups. To this end, we focused our analysis on the most vulnerable records (99th MIA AUC percentile) and compared how frequently a subgroup appears in this extreme-risk tail compared with the overall dataset. We did not consider differences in aggregate attack success, as we previously identified this metric as an unsuitable measure of privacy risk.

We find that extreme MIA risk is unequally distributed across patient subgroups when stratifying by disease status, self-reported race, sex, imaging protocol or health insurance. More precisely, for most comparisons, we observe significant differences in subgroup composition between the most vulnerable records and the overall dataset (Fig. 3 and Extended Data Fig. 4). For example, in MIMIC-IV-ED, records from Black patients, patients with Medicaid insurance or patients diagnosed with cancer were observed more frequently than expected among the most vulnerable records (+31%, +126%, and +18% relative change to the overall dataset, respectively). Raw data on the composition of the extreme MIA risk tails as well as the overall datasets are provided in Supplementary Tables 316. To find factors that could explain the observed differences, we performed a post hoc test analysis and computed Pearson residuals for all subgroup comparisons (Fig. 3 and Extended Data Fig. 4).

Fig. 3: Significant differences in extreme MIA risk between patient subgroups.
Fig. 3: Significant differences in extreme MIA risk between patient subgroups.

a–d, The panel rows show two-sided χ2-test results for subgroup counts among the 99th record-level MIA AUC percentile for CheXpert (a), MIMIC-CXR (b), EMBED (c) and MIMIC-IV-ED (d). Pearson residuals measure the contribution of a group to the test statistic. A large, positive value indicates that more records were observed for this group than expected. A negative value indicates the opposite. The column titles show stratification variables; bars are coloured according to the relative share of records of a group in the training dataset. *P ≤ 0.05, **P ≤ 0.01, ***P ≤ 0.001 and NS, not significant. Multiple comparison correction was applied row-wise using the Bonferroni method; statistical significance could not be tested for CheXpert and MIMIC-CXR disease label groups as the categorization is not mutually exclusive; race subgroup comparisons in EMBED are likely confounded by breast density differences. Left to right, CheXpert/MIMIC-CXR disease labels refer to no finding (NF), enlarged cardiomediastinum (EC), cardiomegaly (Cm), lung opacity (LO), lung lesion (LL), oedema (Ed), consolidation (Co), pneumonia (Pn), atelectasis (At), pneumothorax (Px), pleural effusion (PE), pleural other (PO), fracture (Fr) and support devices (SD). Imaging protocol abbreviations refer to anteroposterior (AP), posteroanterior (PA), lateral (L), lateral left (LL), mediolateral oblique (MLO) and craniocaudal (CC). BI-RADS indicates breast cancer assessment: incomplete (BI-RADS-0) to biopsy-proven malignancy (BI-RADS-6) and breast density: mostly fatty (BI-RADS-A) to extremely dense (BI-RADS-D). Left to right, adjusted P values are as follows. CheXpert: 8.1 × 10−9, 9.7 × 10−2, 2.1 × 10−5; MIMIC-CXR: 0.1, 2.2 × 10−23, 1.6 × 10−29; EMBED: 1.0 × 10−68, <1.0 × 10−100, 1.0 × 10−8, 1.0, 2.7 × 10−7. MIMIC-IV-ED: 0.61, 4.1 × 10−24, 2.3 × 10−9, 6.3 × 10−20, 1.8 × 10−2, 2.3 × 10−53.

We primarily observe large, positive Pearson residuals for underrepresented groups in the datasets, suggesting that relative group size influences MIA risk. Consider, for example, EMBED38, a mammography dataset comprising mostly negative findings, that is, unremarkable mammograms of healthy breasts with no indication of a tumour. Models for this dataset are trained to predict breast density, and thus never have direct access to tumour findings. Despite this, benign tumour findings (BI-RADS-2) and tumour findings suspicious of malignancy (BI-RADS-4) account for a disproportionately large share of the most vulnerable records (+60% and +1,179% relative change to the overall dataset, respectively). Similarly, otherwise relatively uncommon images of almost entirely fatty (BI-RADS-A) or extremely dense (BI-RADS-D) breasts also occur disproportionately frequently (+90% and +755%, respectively).

To further investigate the relationship between group size and MIA risk, we conducted a meta-analysis of all computed Pearson residuals (Extended Data Fig. 5). Confirming previous observations, we find that large positive Pearson residuals occur mostly for small groups (those that contribute less than 20% of the records of a dataset). Moreover, we observe a weak to moderate negative correlation between group size and Pearson residuals. This suggests that the observed differences in MIA risk may, at least in part, be driven by group-size differences in the training data.

Discussion

We present data from the first patient-level privacy audit of medical AI models. Our findings confirm early observations of MIA risk heterogeneity39,40,41,42 and, at the same time, substantially advance previous AI privacy auditing efforts along three key dimensions. First, our work marks a shift towards patient-level risk assessment, which is crucial for real-world clinical datasets, in which individuals often contribute multiple, similar records. Second, we demonstrate that aggregate success rates, as used in the standard evaluation protocol and previous subgroup analyses41,42, underestimate true privacy risks. Third, we confirm that MIA vulnerabilities previously observed on low-dimensional benchmark datasets2,3,4,39,40,41,42 are present, and arguably more critical, in large representative clinical datasets. Below, we briefly discuss our findings and their implications.

The fact that MIAs can achieve near-perfect success rates for individual patients is not adequately captured by the standard evaluation protocol, which measures attack success in aggregate across records. This remains true even when evaluating aggregate attack success at very low false-positive rates (for example, 10−4), which is the current standard practice. Thus, reporting standards for AI privacy audits need to change. Audits should report the success of privacy attacks at the level of individual data contributors or, if the necessary patient- or person-level identifiers are unavailable, at the record level.

We observed that the number of patients highly vulnerable to MIAs increases drastically for larger models. Although the magnitude of this change in patient-level risk was previously unknown, other works have also reported greater attack success against larger, more performant models3,5,7,43. This observation that privacy risks grow with model size and predictive performance is explained by theoretical research44,45, which postulates that, for long-tailed data distributions, fitting atypical records from the tail is necessary to achieve optimal performance on unseen data at test time. Our results provide further empirical support for this theory and, together, suggest that a trade-off between patient privacy and model performance is inevitable, particularly for rare diseases. Generally, as we found that the number of patients highly vulnerable to MIAs increases by orders of magnitude with larger models, we recommend carefully evaluating the need for the performance improvements they offer.

We found substantial differences in the frequency with which patients from different subgroups experience extreme MIA risk. The fact that some of these groups (for example, self-reported race subgroups in chest radiographs) are not readily distinguishable by human experts raises concerns that MIA risk differences, which probably exist beyond the stratification variables we investigated, may pass unnoticed in practice. We found that the observed risk differences are driven, at least in part, by group-size differences in the training data. Groups of patients that are underrepresented in a model training dataset are often overrepresented among the records most susceptible to MIAs. By contrast, the opposite often holds for majority groups. This finding—that a disproportionately large share of the AI privacy risk burden rests on underrepresented groups—complements the existing literature on health inequalities, which has reported worse health outcomes and life expectancy for marginalized and minority groups46. Our findings suggest that current trends in medical AI development and deployment could exacerbate these health inequalities. Previous research has shown that the diagnostic performance of AI models, which typically increases with the amount of suitable training data, can be significantly lower for underrepresented (minority) groups36,37,47. Thus, there is a possibility of a vicious cycle in which minority groups place decreasing levels of trust in AI model performance and security, leading to a decreased willingness to contribute to model training datasets.

MIAs facilitate data extraction attacks against generative AI models5,6,7. Thus, our findings have potentially far-reaching implications for generative AI privacy risk assessments. Extraction attacks allow for high-fidelity reconstruction of full individual records from the training dataset of a model and have been demonstrated for large language models5, diffusion-based image generation models6 and recently, aligned, production-level large language models7. Although our study focused on discriminative (diagnostic) AI models, the type of attack we studied is generally applicable and can be used against generative models with little to no modification. We thus see the exploration of our proposed methodology for estimating record- and patient-level MIA success against generative models as an interesting direction for future research. Given the substantial computational resources this would require, exploring scalable approximation techniques is another valuable avenue to investigate.

Unlocking the full potential of medical AI will require training models on vast medical datasets; this depends on gaining and upholding the trust of data-contributing patients. To this end, mathematically verifiable approaches to risk mitigation, such as DP48, are emerging as the most promising solution. DP, by carefully perturbing parameter updates with white noise during model training or fine-tuning49, limits the contribution of the data of any individual to the parameter update and, by extension, to the final model. This provably protects the privacy of any data-contributing patient, no matter how unique or atypical their data may be. Our experimental data confirmed that stronger levels of DP protection effectively reduce MIA success for all data-contributing patients. However, we also observed that mitigating MIAs requires stronger levels of DP protection than previously thought3,50. Specifically, our results indicate that fully mitigating MIAs for all data-contributing patients requires implementing DP protection at the patient level rather than at the record level. Recent research has demonstrated that, in practice, AI models can be trained with strong privacy guarantees while incurring minimal degradation in predictive performance compared with a non-private model51,52,53,54. We are thus optimistic that medical AI models protected by DP will have a significant positive impact on health outcomes globally without endangering the privacy of any data-contributing patient.

In summary, we present evidence that MIAs can be highly effective at compromising the privacy of individual data-contributing patients. Given this vulnerability, medical AI models and their deployment contexts should be assessed for the sensitive information that attackers could obtain by successfully inferring training dataset membership. To prevent privacy harm, we recommend that vulnerable models be protected by verifiable risk mitigation strategies and/or strict access controls.

RELATED ARTICLES

Most Popular

Recent Comments