Disparate privacy risks from medical AI

June 25, 2026

4

Medical artificial intelligence (AI) has immense potential to improve health outcomes, particularly in regions in which specialized medical expertise is scarce¹. At the same time, AI also poses new challenges and risks, including security vulnerabilities that arise when models are deployed. Untrusted users with access to an AI model may, by merely observing its predictions, steal its parameters^8,9 or perform privacy attacks^2,3,4,5,6,7, which can extract sensitive details about the data used for model training.

Privacy attacks against an AI model can enable detailed inferences about the individuals who contributed to its training data. For example, a membership inference attack (MIA)² attempts to determine whether the data of a specific patient were included in the training dataset of a model. The extent to which this constitutes a privacy violation is nuanced and depends on factors such as the underlying training population and the deployment context of the model. Although inferring membership for a model trained on a general population may be benign, doing so for a model trained on a narrow, disease- or centre-specific cohort acts as a direct proxy for sensitive medical information. For example, a successful MIA against the model in ref. ¹⁰, which predicts anti-cancer immunotherapy efficacy from routine blood test data, reveals that an individual has cancer.

The accelerating deployment of medical AI models trained on sensitive patient data¹¹ calls for rigorous privacy risk assessments. However, previous studies primarily quantified the success rate of MIAs, in aggregate, across all records in a training dataset. This implicitly averages risk across records, thereby obscuring important information on record- and patient-level attack success. Consequently, the risk that an individual faces by contributing their personal data (often multiple records) to an AI training dataset is poorly understood. Given that medical data are a key target for cybercriminals^12,13, and pseudonymization alone is increasingly recognized as insufficient to prevent the re-identification of individuals in large, high-dimensional datasets^14,15,16, there is a need to improve our understanding of the threat that AI privacy attacks pose to individual patients.

Here we show that deploying medical AI models without protective measures can pose substantial privacy risks to individual data-contributing patients. These risks are particularly acute when membership in a training population itself reveals sensitive medical information. Our privacy audit of AI models trained to perform standard diagnostic (supervised classification) tasks quantifies state-of-the-art MIA success^3,4 at the resolution of individual data contributors. Using seven large datasets comprising real-world clinical data, including various types of medical images, electrocardiograms and electronic health records, we demonstrate that the success of a MIA is unequally distributed among data-contributing patients. We show that this disparity exists at two levels: (1) the individual patient level, at which some patients experience near-perfect attack success, whereas others remain essentially unaffected; and (2) the group level, at which patient groups underrepresented in a training dataset are often overrepresented among records most vulnerable to MIAs.

Together, our results indicate that privacy attacks against AI models may be much more effective at compromising the privacy of individual data contributors than previously thought. This suggests that current AI privacy risk reporting practices may underestimate individual-level risk and thus motivates the integration of mathematically verifiable risk mitigation strategies such as differential privacy (DP) into medical AI model development workflows.

Attacking AI by simple hypothesis tests

A popular deployment strategy for AI models gives users access to a model through a prediction interface, which, for a given input (for example, the chest radiograph of a patient), returns a corresponding prediction (for example, a 78% chance of pneumonia). This black-box access to a model can be exploited by an untrusted user to conduct a MIA that shows the membership status of a target record, that is, whether the target record was a member of the training dataset of a model or not (Fig. 1a). To infer membership status, MIAs typically make use of the fact that AI models are often slightly more confident about their predictions on training than on non-training data.

**Fig. 1: MIA and evaluation strategies.**

Likelihood-ratio MIAs (LR-MIAs)^3,4, the current state-of-the-art in MIAs, frame membership inference as a simple vs. simple hypothesis testing problem on the prediction confidence provided by the target model. In essence, LR-MIAs compare the likelihood of the predicted confidence of the target model for the target record under the null (the target record was not a member) and the alternative hypothesis (the target record was a member). Here, the parameters of the distributions under the two hypotheses are specified by parametric fitting of sample confidence values obtained from reference models. Reference models are models assumed to be trained by the attacker and are ideally, but not necessarily, of similar architecture as the target model and trained on data similar to the training dataset of the target model.

Note that objectively larger threats are posed by privacy attacks with stronger assumptions on a potential attacker, such as access to model parameters¹⁷, access to parameter updates during model training¹⁸ or, furthermore, the ability to modify the model architecture^19,20. However, we do not consider them in this study as their strong assumptions are not realistic for careful, practical deployment scenarios. By contrast, the type of attack we consider here requires querying the target model only once (to obtain a prediction for the target record) and may thus be executed by any attacker posing as a real user of an AI system. Notably, as the attacks we study are executed against fully trained models, data-governance-preserving techniques such as federated/swarm-learning²¹ provide no protection.

From aggregate to patient-level risk

MIA performance is evaluated through a receiver operating characteristic (ROC) analysis²² on numerous repetitions of the MIA game scenario, in which an untrusted user is challenged to guess the membership status of a given record (Fig. 1a). In practice, owing to the computational cost of training AI models, attack success is typically evaluated using a single target model. More specifically, a target model is trained on a random subset of the training dataset, and subsequently, an ROC analysis is performed on the aggregated membership predictions for all records in the dataset (Fig. 1b). Although practical, this approach has a key shortcoming: it provides no indication of the performance of the attack for individual records or patients.

To address this issue, we propose a simple technique for estimating record-, and by extension, patient-level vulnerability to LR-MIAs (Fig. 1c). In brief, using a large set of target models (N = 200) trained on random patient subsets, we estimate, for each training record, sampling distributions of the confidence of the target model under the null and alternative hypotheses in LR-MIAs. In other words, we estimate empirical distributions of confidence values as provided by target models, partitioned into models trained and not trained on the target record. Because these distributions are assumed to take Gaussian form in LR-MIAs^3,4, record-level attack success, as measured by the area under the ROC curve (AUC), can be calculated in closed form (Methods). A high AUC score, close to the maximum value of 1.0, suggests high privacy risk: a MIA for this record could achieve high sensitivity with little to no false positives. Notably, the record-level MIA AUC also offers a probabilistic interpretation²²: the record-level MIA AUC is the probability that a confidence score from a target model trained on the target record is larger than a score from a target model not trained on the target record.

Correctly determining the membership status for one of the records contributed by an individual patient reveals the membership status of the patient. Thus, we compute patient-level scores by taking the maximum across all record-level scores for a given patient. The raw record-level scores and the average patient-level scores can be found in Extended Data Fig. 1.

Notably, our technique for measuring record-level attack success reduces to estimating the bi-normal AUC from sample statistics and thus has desirable statistical properties. Its standard error at the record level can be computed in closed form²³ (Methods). As expected, using a total of N = 200 target models evenly split between null and alternative hypotheses for each record, the standard error of the record-level MIA AUC is small across all records in the investigated datasets (Extended Data Fig. 2).

Attacking open-source models

Recent advances in attack design have made LR-MIAs much more practical. To illustrate the practical feasibility of conducting MIAs, we demonstrate attacks against two chest radiograph models from the TorchXrayVision²⁴ library. We used the Robust Membership Inference Attack⁴ (RMIA), an improved LR-MIA that requires only one or two reference models, compared with more than 100 for the Likelihood Ratio Attack³ (LiRA). RMIA achieves this efficiency gain by effectively using reference data (data similar to the target record) alongside the target record to query the target model. Crucially, the attack does not require knowledge of the membership status of the reference data.

We simulated a realistic attack setting in which an attacker lacks access to the training dataset of the target model to train reference models and is further constrained by computational resources. Specifically, we used only a single pre-trained PadChest²⁵ model as a reference model to perform attacks against the CheXpert²⁶ and MIMIC-CXR²⁷ models of the library. In this setting, also known as an offline attack, an attacker incurs no computational cost in training the reference model. Instead, they simply need to obtain predictions from the reference model for both the target record and the reference data. This can be done efficiently on commodity hardware without a graphics processing unit (GPU). To conduct the attack, we queried the target model once to collect confidence values for all target records. Using this collection, we then computed RMIA test statistics for each target record by randomly selecting, independent of membership status, N = 500 confidence values from the other targets in this collection as reference data. This strategy would effectively conceal the additional reference data queries to the target model in a real attack.

We evaluated attack success on a combined dataset of records from CheXpert and MIMIC-CXR (N = 25, 000 each), which were, respectively, labelled as members and non-members for the CheXpert model (v.v. for the MIMIC-CXR model). In this setting, RMIA achieved substantial aggregate success with respective AUC scores of 0.61 and 0.65 (Fig. 2a). Note that owing to the distribution shift between members and non-members, results from this evaluation setting are not directly comparable to the standard evaluation protocol in which members and non-members are sampled at random from the training dataset. Notably, however, such a distribution shift is expected in a real attack, and this setting is thus of high interest.

**Fig. 2: MIAs pose substantial privacy risks to individual data-contributing patients.**

Near-perfect success for some patients

After demonstrating realistic attacks against two open-source models, we next investigated how effectively MIAs can compromise the privacy of individual patients. To this end, we measured patient-level MIA success across a diverse range of medical datasets using, for each, a large set of target models. Notably, we used state-of-the-art model training techniques (for example, data augmentation, weight decay and learning rate schedules) and furthermore, took explicit countermeasures to prevent overfitting, which is known to exacerbate privacy risks^2,28. As a result, the investigated target models, despite being trained on roughly half of the available data each, provide high diagnostic performance within a few percentage points of published baselines (Methods).

Across all investigated datasets and models, we identified a small subset of patients who are highly vulnerable to LR-MIAs. This is indicated by empirical survival functions (eSF) of patient-level MIA AUC scores, which, for a given score, show the proportion of patients with this score or higher (Fig. 2b). By contrast, ROC curves of aggregate attack success and their corresponding AUC scores do not deviate substantially from the random-guessing baseline, thus incorrectly indicating a low attack vulnerability (Fig. 2c and Extended Data Fig. 1c–e). This suggests that average-case metrics of attack success, as used in the standard evaluation protocol, are unsuitable measures of privacy risk. They do not accurately reflect that some records or patients may be highly vulnerable, whereas the vast majority are not.

For the two non-imaging datasets, MIMIC-IV-ED²⁹ (electronic health records) and PTB-XL³⁰ (electrocardiograms), we simulated attack settings in which an attacker only has partial access to the target record (Extended Data Fig. 3). Although MIA success generally decreases under partial data access, a subset of patients retain high AUC scores, even in settings in which the attacker has access to only basic clinical information—such as a patients’ age, sex, chief complaints and vital signs (MIMIC-IV-ED), or only the lead I signal from a 12-lead electrocardiogram (PTB-XL).

We verified how resolvable the discovered vulnerabilities are by training models with different levels of record-level (ε, δ)-DP protection (Fig. 2d,e). As expected, we find that patient-level MIA risk decreases with stronger levels of privacy protection (smaller ε values). Moreover, in most scenarios, we observe no violation of the record-level DP guarantee (indicated by the square brackets in the panel legend), although many patients contributed multiple records. Violations are observed only for a subset of patients under strong privacy protection (ε = 1), in which some patients have MIA AUC scores exceeding the upper bound on the MIA AUC implied by the record-level DP guarantee. This behaviour is expected and could be mitigated by implementing patient-level DP accounting.

Larger models, greater risks

Many of the recent AI success stories have been driven not by methodological advances but by scaling up model and dataset sizes³¹. In light of this scaling trend, we next investigated the impact of model capacity on MIA success. For Fitzpatrick 17k³² and CheXpert, we trained models with increasing capacity, including wide residual networks³³ (WRN-28-2 and WRN-40-4) and vision transformers³⁴ (ViT-B/16 and ViT-L/16). Where computationally feasible, vision transformers were trained on images of different sizes: 64 × 64 and 128 × 128 pixels; this is indicated by a trailing number behind the model name (for example, ViT-B/16-64 and ViT-B/16-128).

We find that MIA success (both at the aggregate and patient levels) increases with model capacity. We observe that the relative share of patients highly vulnerable to MIAs increases greatly for larger models, often by an order of magnitude (Fig. 2f,g). For the dermatology dataset (Fitzpatrick 17k), increasing model capacity yields large gains in diagnostic performance with a pronounced increase between WRN-40-4 and ViT-B/16-128, which was pre-trained on a large dataset of more than 14 million natural images. However, simultaneously, the number of patients with near-perfect attack success (AUC score of 0.95 or higher) increases substantially: 0 (WRN-28-2), 1 out of 10,000 (WRN-40-4), 1 out of 1,000 (ViT-B/16-64) and 1 out of 10 (ViT-B/16-128). We observe a similar trend in the much larger dataset CheXpert, although attack success is generally lower. Notably, for CheXpert, vision transformer models do not achieve diagnostic performance competitive with WRN-based models. This is probably because of the diminished utility of natural-image pre-training for medical greyscale images³⁵.

Attack success varies by subgroup

Motivated by recent findings^36,37, which revealed that the diagnostic performance of AI models can differ across patient subgroups, we investigated whether differences in privacy risk exist between subgroups. To this end, we focused our analysis on the most vulnerable records (99th MIA AUC percentile) and compared how frequently a subgroup appears in this extreme-risk tail compared with the overall dataset. We did not consider differences in aggregate attack success, as we previously identified this metric as an unsuitable measure of privacy risk.

We find that extreme MIA risk is unequally distributed across patient subgroups when stratifying by disease status, self-reported race, sex, imaging protocol or health insurance. More precisely, for most comparisons, we observe significant differences in subgroup composition between the most vulnerable records and the overall dataset (Fig. 3 and Extended Data Fig. 4). For example, in MIMIC-IV-ED, records from Black patients, patients with Medicaid insurance or patients diagnosed with cancer were observed more frequently than expected among the most vulnerable records (+31%, +126%, and +18% relative change to the overall dataset, respectively). Raw data on the composition of the extreme MIA risk tails as well as the overall datasets are provided in Supplementary Tables 3–16. To find factors that could explain the observed differences, we performed a post hoc test analysis and computed Pearson residuals for all subgroup comparisons (Fig. 3 and Extended Data Fig. 4).

**Fig. 3: Significant differences in extreme MIA risk between patient subgroups.**

We primarily observe large, positive Pearson residuals for underrepresented groups in the datasets, suggesting that relative group size influences MIA risk. Consider, for example, EMBED³⁸, a mammography dataset comprising mostly negative findings, that is, unremarkable mammograms of healthy breasts with no indication of a tumour. Models for this dataset are trained to predict breast density, and thus never have direct access to tumour findings. Despite this, benign tumour findings (BI-RADS-2) and tumour findings suspicious of malignancy (BI-RADS-4) account for a disproportionately large share of the most vulnerable records (+60% and +1,179% relative change to the overall dataset, respectively). Similarly, otherwise relatively uncommon images of almost entirely fatty (BI-RADS-A) or extremely dense (BI-RADS-D) breasts also occur disproportionately frequently (+90% and +755%, respectively).

To further investigate the relationship between group size and MIA risk, we conducted a meta-analysis of all computed Pearson residuals (Extended Data Fig. 5). Confirming previous observations, we find that large positive Pearson residuals occur mostly for small groups (those that contribute less than 20% of the records of a dataset). Moreover, we observe a weak to moderate negative correlation between group size and Pearson residuals. This suggests that the observed differences in MIA risk may, at least in part, be driven by group-size differences in the training data.

Discussion

We present data from the first patient-level privacy audit of medical AI models. Our findings confirm early observations of MIA risk heterogeneity^39,40,41,42 and, at the same time, substantially advance previous AI privacy auditing efforts along three key dimensions. First, our work marks a shift towards patient-level risk assessment, which is crucial for real-world clinical datasets, in which individuals often contribute multiple, similar records. Second, we demonstrate that aggregate success rates, as used in the standard evaluation protocol and previous subgroup analyses^41,42, underestimate true privacy risks. Third, we confirm that MIA vulnerabilities previously observed on low-dimensional benchmark datasets^{2,3,4,39,40,41,42} are present, and arguably more critical, in large representative clinical datasets. Below, we briefly discuss our findings and their implications.

The fact that MIAs can achieve near-perfect success rates for individual patients is not adequately captured by the standard evaluation protocol, which measures attack success in aggregate across records. This remains true even when evaluating aggregate attack success at very low false-positive rates (for example, 10⁻⁴), which is the current standard practice. Thus, reporting standards for AI privacy audits need to change. Audits should report the success of privacy attacks at the level of individual data contributors or, if the necessary patient- or person-level identifiers are unavailable, at the record level.

We observed that the number of patients highly vulnerable to MIAs increases drastically for larger models. Although the magnitude of this change in patient-level risk was previously unknown, other works have also reported greater attack success against larger, more performant models^3,5,7,43. This observation that privacy risks grow with model size and predictive performance is explained by theoretical research^44,45, which postulates that, for long-tailed data distributions, fitting atypical records from the tail is necessary to achieve optimal performance on unseen data at test time. Our results provide further empirical support for this theory and, together, suggest that a trade-off between patient privacy and model performance is inevitable, particularly for rare diseases. Generally, as we found that the number of patients highly vulnerable to MIAs increases by orders of magnitude with larger models, we recommend carefully evaluating the need for the performance improvements they offer.

We found substantial differences in the frequency with which patients from different subgroups experience extreme MIA risk. The fact that some of these groups (for example, self-reported race subgroups in chest radiographs) are not readily distinguishable by human experts raises concerns that MIA risk differences, which probably exist beyond the stratification variables we investigated, may pass unnoticed in practice. We found that the observed risk differences are driven, at least in part, by group-size differences in the training data. Groups of patients that are underrepresented in a model training dataset are often overrepresented among the records most susceptible to MIAs. By contrast, the opposite often holds for majority groups. This finding—that a disproportionately large share of the AI privacy risk burden rests on underrepresented groups—complements the existing literature on health inequalities, which has reported worse health outcomes and life expectancy for marginalized and minority groups⁴⁶. Our findings suggest that current trends in medical AI development and deployment could exacerbate these health inequalities. Previous research has shown that the diagnostic performance of AI models, which typically increases with the amount of suitable training data, can be significantly lower for underrepresented (minority) groups^36,37,47. Thus, there is a possibility of a vicious cycle in which minority groups place decreasing levels of trust in AI model performance and security, leading to a decreased willingness to contribute to model training datasets.

MIAs facilitate data extraction attacks against generative AI models^5,6,7. Thus, our findings have potentially far-reaching implications for generative AI privacy risk assessments. Extraction attacks allow for high-fidelity reconstruction of full individual records from the training dataset of a model and have been demonstrated for large language models⁵, diffusion-based image generation models⁶ and recently, aligned, production-level large language models⁷. Although our study focused on discriminative (diagnostic) AI models, the type of attack we studied is generally applicable and can be used against generative models with little to no modification. We thus see the exploration of our proposed methodology for estimating record- and patient-level MIA success against generative models as an interesting direction for future research. Given the substantial computational resources this would require, exploring scalable approximation techniques is another valuable avenue to investigate.

Unlocking the full potential of medical AI will require training models on vast medical datasets; this depends on gaining and upholding the trust of data-contributing patients. To this end, mathematically verifiable approaches to risk mitigation, such as DP⁴⁸, are emerging as the most promising solution. DP, by carefully perturbing parameter updates with white noise during model training or fine-tuning⁴⁹, limits the contribution of the data of any individual to the parameter update and, by extension, to the final model. This provably protects the privacy of any data-contributing patient, no matter how unique or atypical their data may be. Our experimental data confirmed that stronger levels of DP protection effectively reduce MIA success for all data-contributing patients. However, we also observed that mitigating MIAs requires stronger levels of DP protection than previously thought^3,50. Specifically, our results indicate that fully mitigating MIAs for all data-contributing patients requires implementing DP protection at the patient level rather than at the record level. Recent research has demonstrated that, in practice, AI models can be trained with strong privacy guarantees while incurring minimal degradation in predictive performance compared with a non-private model^51,52,53,54. We are thus optimistic that medical AI models protected by DP will have a significant positive impact on health outcomes globally without endangering the privacy of any data-contributing patient.

In summary, we present evidence that MIAs can be highly effective at compromising the privacy of individual data-contributing patients. Given this vulnerability, medical AI models and their deployment contexts should be assessed for the sensitive information that attackers could obtain by successfully inferring training dataset membership. To prevent privacy harm, we recommend that vulnerable models be protected by verifiable risk mitigation strategies and/or strict access controls.

Disparate privacy risks from medical AI

Attacking AI by simple hypothesis tests

From aggregate to patient-level risk

Attacking open-source models

Near-perfect success for some patients

Larger models, greater risks

Attack success varies by subgroup

Discussion

Small-molecule modulation of β-arrestins | Nature

Chiral laser gyroscopes breaking the lock-in limit

Crude oil fractionation by means of mesoporous polyacrylonitrile membranes

Most Popular

Sales flat while inventory is tightened

OpenAI unveils its first custom chip, built by Broadcom

Small-molecule modulation of β-arrestins | Nature

Tyshawn Sorey: Members… Don’t! Album Review

Recent Comments

ABOUT US

POPULAR POSTS

Sales flat while inventory is tightened

OpenAI unveils its first custom chip, built by Broadcom

Small-molecule modulation of β-arrestins | Nature

POPULAR CATEGORY