Thursday, March 5, 2026
No menu items!
HomeNatureMerlin: a computed tomography vision–language foundation model and dataset

Merlin: a computed tomography vision–language foundation model and dataset

  • Schöckel, L. et al. Developments in X-ray contrast media and the potential impact on computed tomography. Invest. Radiol. 55, 592–597 (2020).

    Article 
    PubMed 

    Google Scholar
     

  • Kanal, K. M. et al. U.S. diagnostic reference levels and achievable doses for 10 adult CT examinations. Radiology 284, 120–133 (2017).

    Article 
    PubMed 

    Google Scholar
     

  • Taschetta-Millane, M. The evolving computed tomography market. Imaging Technology News https://www.itnonline.com/article/evolving-computed-tomography-market (2024).

  • Hudnall, C. Maximum capacity: overloaded radiologists are grappling with solutions to a booming volume crisis. American College of Radiology https://www.acr.org/Practice-Management-Quality-Informatics/ACR-Bulletin/Articles/April-2024/Maximum-Capacity (2024).

  • Milburn, J. Workforce-shortage. How will we solve our radiology workforce shortage? American College of Radiology https://www.acr.org/Practice-Management-Quality-Informatics/ACR-Bulletin/Articles/March-2024/How-Will-We-Solve-Our-Radiology-Workforce-Shortage (2024).

  • Rimmer, A. Radiologist shortage leaves patient care at risk, warns royal college. BMJ 359, j4683 (2017).

    Article 
    PubMed 

    Google Scholar
     

  • Paschali, M. et al. Foundation models in radiology: what, how, why, and why not. Radiology 314, e240597 (2025).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Zhang, S. et al. A multimodal biomedical foundation model trained from fifteen million image–text pairs. NEJM AI 2, AIoa2400640 (2025).

    Article 

    Google Scholar
     

  • Chaves, J. M. et al. A clinically accessible small multimodal radiology model and evaluation metric for chest X-ray findings. Nat. Commun. 16, 3108 (2025).

    Article 
    ADS 

    Google Scholar
     

  • Tu, T. et al. Towards generalist biomedical AI. NEJM AI 1, AIoa2300138 (2024).

    Article 

    Google Scholar
     

  • Wu, C., Zhang, X., Zhang, Y., Wang, Y. & Xie, W. Towards generalist foundation model for radiology by leveraging web-scale 2D & 3D medical data. Nat. Commun. 16, 7866 (2025).

    Article 
    ADS 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Chen, Z. et al. CheXagent: Towards a foundation model for chest X-ray interpretation. In AAAI 2024 Spring Symposium on Clinical Foundation Models (AAAI, 2024).

  • Udare, A. et al. Radiologist productivity analytics: factors impacting abdominal pelvic CT exam reporting times. J. Digit. Imaging 35, 87–97 (2022).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Liu, D. et al. Fully automated CT-based adiposity assessment: comparison of the L1 and L3 vertebral levels for opportunistic prediction. Abdom. Radiol. 48, 787–795 (2023).

    Article 
    CAS 

    Google Scholar
     

  • Blankemeier, L. et al. Opportunistic incidence prediction of multiple chronic diseases from abdominal CT imaging using multi-task learning. In Proc. 25th International Conference on Medical Image Computing and Computer-Assisted Intervention 309–318 (Springer, 2022).

  • Zambrano Chaves, J. M. et al. Opportunistic assessment of ischemic heart disease risk using abdominopelvic computed tomography and medical record data: a multimodal explainable artificial intelligence approach. Sci. Rep. 13, 21034 (2023).

    Article 
    ADS 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Cao, K. et al. Large-scale pancreatic cancer detection via non-contrast CT and deep learning. Nat. Med. 29, 3033–3043 (2023).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Wang, Y.-R. et al. Screening and diagnosis of cardiovascular disease using artificial intelligence-enabled cardiac magnetic resonance imaging. Nat. Med. 30, 1471–1480 (2024).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Langlotz, C. P. The future of AI and informatics in radiology: 10 predictions. Radiology 309, e231114 (2023).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Artificial Intelligence and Machine Learning (AI/ML)-Enabled Medical Devices (US Food and Drug Administration, 2023).

  • Radford, A. et al. Learning transferable visual models from natural language supervision. In Proc. 38th International Conference on Machine Learning 8748–8763 (PMLR, 2021).

  • Schuhmann, C. et al. Laion-5b: an open large-scale dataset for training next generation image-text models. Adv. Neural Inf. Process. Syst. 35, 25278–25294 (2022).


    Google Scholar
     

  • Larson, D. B., Magnus, D. C., Lungren, M. P., Shah, N. H. & Langlotz, C. P. Ethics of using and sharing clinical imaging data for artificial intelligence: a proposed framework. Radiology 295, 675–682 (2020).

    Article 
    PubMed 

    Google Scholar
     

  • Hyland, S. L. et al. MAIRA-1: a specialised large multimodal model for radiology report generation. Preprint at https://arxiv.org/abs/2311.13668 (2023).

  • Huang, S.-C. et al. PENet—a scalable deep-learning model for automated diagnosis of pulmonary embolism using volumetric CT imaging. npj Digit. Med. 3, 61 (2020).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Christensen, M., Vukadinovic, M., Yuan, N. & Ouyang, D. Vision–language foundation model for echocardiogram interpretation. Nat. Med. 30, 1481–1488 (2024).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Polevikov, S. Med-gemini by Google: A boon for researchers, a bane for doctors. AI Health Uncut https://sergeiai.substack.com/p/googles-med-gemini-im-excited-and (2024).

  • Fleming, S. L. et al. Medalign: a clinician-generated dataset for instruction following with electronic medical records. Proc. AAAI Conf. Artif. Intell. 38, 22021–22030 (2024).

    PubMed 
    PubMed Central 

    Google Scholar
     

  • Liebl, H. et al. A computed tomography vertebral segmentation dataset with anatomical variations and multi-vendor scanner data. Sci. Data 8, 284 (2021).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Wasserthal, J. et al. TotalSegmentator: robust segmentation of 104 anatomic structures in CT images. Radiol. Artif. Intell. 5, e230024 (2023).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Cherti, M. et al. Reproducible scaling laws for contrastive language–image learning. In Proc. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition 2818–2829 (IEEE, 2023).

  • Löffler, M. T. et al. A vertebral segmentation dataset with fracture grading. Radiol. Artif. Intell. 2, e190138 (2020).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Carreira, J. & Zisserman, A. Quo vadis, action recognition? A new model and the kinetics dataset. In Proc. 2017 IEEE Conference on Computer Vision and Pattern Recognition 6299–6308 (IEEE, 2017).

  • Denny, J. C. et al. Systematic comparison of phenomewide association study of electronic medical record data and genome-wide association study data. Nat. Biotechnol. 31, 1102–1111 (2013).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Liu, Z. et al. A convnet for the 2020s. In Proc. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition 11976–11986 (IEEE, 2022).

  • Liu, Z. et al. Swin transformer: hierarchical vision transformer using shifted windows. In Proc. 2021 IEEE/CVF International Conference on Computer Vision 10012–10022 (IEEE, 2021).

  • Li, Y., Wehbe, R. M., Ahmad, F. S., Wang, H. & Luo, Y. Clinical-Longformer and Clinical-BigBird: transformers for long clinical sequences. Preprint at https://arxiv.org/abs/2201.11838 (2022).

  • Delbrouck, J.-B. et al. Improving the factual correctness of radiology report generation with semantic rewards. In Findings of the Association for Computational Linguistics: EMNLP 2022 4348–4360 (Association for Computational Linguistics, 2022).

  • Zhang, T., Kishore, V., Wu, F., Weinberger, K. Q. & Artzi, Y. BERTScore: Evaluating text generation with BERT. In International Conference on Learning Representations (ICLR, 2020).

  • Lin, C.-Y. ROUGE: a package for automatic evaluation of summaries. In Proc. Text Summarization Branches Out 74–81 (Association for Computational Linguistics, 2004).

  • Papineni, K., Roukos, S., Ward, T. & Zhu, W.-J. BLEU: a method for automatic evaluation of machine translation. In Proc. 40th Annual Meeting of the Association for Computational Linguistics 311–318 (Association of Computational Linguistics, 2002).

  • Isensee, F., Jaeger, P. F., Kohl, S. A. A., Petersen, J. & Maier-Hein, K. H. nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation. Nat. Methods 18, 203–211 (2021).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Codella, N. C. F. et al. MedImageInsight: an open-source embedding model for general domain medical imaging. Preprint at https://arxiv.org/abs/2410.06542 (2024).

  • Yang, L. et al. Advancing multimodal medical capabilities of Gemini. Preprint at https://arxiv.org/abs/2405.03162 (2024).

  • Hamamci, I. E. et al. Developing generalist foundation models from a multimodal dataset for 3D computed tomography. Preprint at https://arxiv.org/abs/2403.17834 (2024).

  • Niu, C. et al. Medical multimodal multitask foundation model for lung cancer screening. Nat. Commun. 16, 1523 (2025).

    Article 
    ADS 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Pai, S. et al. Vision foundation models for computed tomography. Preprint at https://arxiv.org/abs/2501.09001 (2025).

  • Huang, S.-C. et al. Self-supervised learning for medical image classification: a systematic review and implementation guidelines. npj Digit. Med. 6, 74 (2023).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Tang, Y. et al. Self-supervised pre-training of Swin transformers for 3D medical image analysis. In Proc. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition 20730–20740 (IEEE, 2022).

  • He, K. et al. Masked autoencoders are scalable vision learners. In Proc. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition 16000–16009 (IEEE, 2021).

  • Laurençon, H., Tronchon, L., Cord, M. & Sanh, V. What matters when building vision-language models? In Proc. 38th International Conference on Neural Information Processing Systems 87874–87907 (NIPS, 2024).

  • Li, Z. et al. Monkey: Image resolution and text label are important things for large multi-modal models. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition 26763–26773 (IEEE, 2024).

  • Chen, T., Kornblith, S., Norouzi, M. & Hinton, G. A simple framework for contrastive learning of visual representations. In Proc. International Conference on Machine Learning 1597–1607 (PMLR, 2020).

  • Van den Oord, A., Li, Y. & Vinyals, O. Representation learning with contrastive predictive coding. Preprint at https://arxiv.org/abs/1807.03748 (2018).

  • Reis, E. P. Automated abdominal CT contrast phase detection using an interpretable and open-source artificial intelligence algorithm. Eur. Radiol. 34, 6680–6687 (2024).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Van Uden, C. et al. Exploring the versatility of zero-shot CLIP for interstitial lung disease classification. Preprint at https://arxiv.org/abs/2306.01111 (2023).

  • Loshchilov, I. & Hutter, F. Decoupled weight decay regularization. In International Conference on Learning Representations (ICLR, 2019).

  • Kirkpatrick, J. et al. Overcoming catastrophic forgetting in neural networks. Proc. Natl Acad. Sci. USA 114, 3521–3526 (2017).

    Article 
    ADS 
    MathSciNet 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Chronic Kidney Disease in the United States, 2023 (Centers for Disease Control and Prevention, 2023).

  • By the Numbers: Diabetes in America (Centers for Disease Control and Prevention, 2022).

  • Facts about Hypertension (Centers for Disease Control and Prevention, 2023).

  • What is Coronary Heart Disease? (US Department of Health and Human Services, 2023).

  • Gu, J., Sanchez, R., Chauhan, A., Fazio, S. & Wong, N. Lipid treatment status and goal attainment among patients with atherosclerotic cardiovascular disease in the United States: a 2019 update. Am. J. Prev. Cardiol. 10, 100336 (2022).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Wright, N. C. et al. The recent prevalence of osteoporosis and low bone mass in the United States based on bone mineral density at the femoral neck or lumbar spine. J. Bone Miner. Res. 29, 2520–2526 (2014).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Johnson, A. E. W. et al. MIMIC-IV, a freely accessible electronic health record dataset. Sci. Data 10, 1 (2023).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Hu, E. J. et al. LoRA: low-rank adaptation of large language models. In International Conference on Learning Representations (ICLR, 2022).

  • Van Veen, D. et al. Adapted large language models can outperform medical experts in clinical text summarization. Nat. Med. 30, 1134–1142 (2024).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Van Veen, D. et al. RadAdapt: radiology report summarization via lightweight domain adaptation of large language models. In Proc. The 22nd Workshop on Biomedical Natural Language Processing and BioNLP Shared Tasks 449–460 (Association for Computational Linguistics, 2023).

  • Ronneberger, O., Fischer, P. & Brox, T. U-Net: convolutional networks for biomedical image segmentation. In Proc. Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015 234–241 (Springer, 2015).

  • Hatamizadeh, A. et al. UNETR: transformers for 3D medical image segmentation. In Proc. 2022 IEEE/CVF Winter Conference on Applications of Computer Vision 574–584 (IEEE, 2022).

  • Xue, C. et al. AI-based differential diagnosis of dementia etiologies on multimodal data. Nat. Med. 30, 2977–2989 (2024).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Yang, A. et al. Qwen3 technical report. Preprint at https://arxiv.org/abs/2505.09388 (2025).

  • RELATED ARTICLES

    Most Popular

    Recent Comments