Monday, March 31, 2025
No menu items!
HomeNatureMultimodal generative AI for medical image interpretation

Multimodal generative AI for medical image interpretation

  • Côté, M. J. & Smith, M. A. Forecasting the demand for radiology services. Health Syst. 7, 79–88 (2018).

    MATH 

    Google Scholar
     

  • Al Yassin, A., Sadaghiani, M. S., Mohan, S., Bryan, R. N. & Nasrallah, I. It is about “time”: academic neuroradiologist time distribution for interpreting brain MRIs. Acad. Radiol. 25, 1521–1525 (2018).

    PubMed 

    Google Scholar
     

  • Reiner, B. I., Knight, N. & Siegel, E. L. Radiology reporting, past, present, and future: the radiologist’s perspective. J. Am. Coll. Radiol. 4, 313–319 (2007).

    PubMed 

    Google Scholar
     

  • Carter, A. J., Davis, K. A., Evans, L. V. & Cone, D. C. Information loss in emergency medical services handover of trauma patients. Prehosp. Emerg. Care 13, 280–285 (2009).

    PubMed 
    MATH 

    Google Scholar
     

  • Clynch, N. & Kellett, J. Medical documentation: part of the solution, or part of the problem? A narrative review of the literature on the time spent on and value of medical documentation. Int. J. Med. Inf. 84, 221–228 (2015).


    Google Scholar
     

  • Bruno, M. A., Walker, E. A. & Abujudeh, H. H. Understanding and confronting our mistakes: the epidemiology of error in radiology and strategies for error reduction. Radiographics 35, 1668–1676 (2015).

    PubMed 
    MATH 

    Google Scholar
     

  • Srinivasa Babu, A. & Brooks, M. L. The malpractice liability of radiology reports: minimizing the risk. Radiographics 35, 547–554 (2015).

    PubMed 

    Google Scholar
     

  • Wahl, B., Cossy-Gantner, A., Germann, S. & Schwalbe, N. R. Artificial intelligence (AI) and global health: how can AI contribute to health in resource-poor settings? BMJ Glob. Health 3, e000798 (2018).

    PubMed 
    PubMed Central 

    Google Scholar
     

  • Ganaie, M. A., Hu, M., Malik, A. K., Tanveer, M. & Suganthan, P. N. Ensemble deep learning: A review. Eng. Appl. Artif. Intell. 115, 105151 (2022).


    Google Scholar
     

  • Weinberg, B. D., Richter, M. D., Champine, J. G., Morriss, M. C. & Browning, T. Radiology resident preliminary reporting in an independent call environment: multiyear assessment of volume, timeliness, and accuracy. J. Am. Coll. Radiol. 12, 95–100 (2015).

    PubMed 

    Google Scholar
     

  • Taylor, A. G., Mielke, C. & Mongan, J. Automated detection of moderate and large pneumothorax on frontal chest X-rays using deep convolutional neural networks: a retrospective study. PLoS Med. 15, e1002697 (2018).

    PubMed 
    PubMed Central 

    Google Scholar
     

  • Cui, S. et al. Development and clinical application of deep learning model for lung nodules screening on CT images. Sci. Rep. 10, 13657 (2020).

    ADS 
    CAS 
    PubMed 
    PubMed Central 
    MATH 

    Google Scholar
     

  • Liu, W. N. et al. Study on detection rate of polyps and adenomas in artificial-intelligence-aided colonoscopy. Saudi J. Gastroenterol. 26, 13 (2020).

    PubMed 
    MATH 

    Google Scholar
     

  • Rodríguez-Ruiz, A. et al. Detection of breast cancer with mammography: effect of an artificial intelligence support system. Radiology 290, 305–314 (2019).

    PubMed 
    MATH 

    Google Scholar
     

  • Delrue, L. et al. in Comparitive Interpretation of CT and Standard Radiography of the Chest (eds Baert, A. I. et al.) 27–49 (Springer, 2011).

  • Messina, P. et al. A survey on deep learning and explainability for automatic report generation from medical images. ACM Comput. Surv. 54, 1–40 (2022).

    MATH 

    Google Scholar
     

  • Mohsan, M. M. et al. Vision transformer and language model based radiology report generation. IEEE Access. 11, 1814–1824 (2022).

    MATH 

    Google Scholar
     

  • Yang, B., Raza, A., Zou, Y. & Zhang, T. PCLmed at ImageCLEFmedical 2023: customizing general-purpose foundation models for medical report generation. CLEF (Working Notes) 1754–1766 (CLEF, 2023).

  • Nicolson, A., Dowling, J. & Koopman, B. Improving chest X-ray report generation by leveraging warm starting. Artif. Intell. Med. 144, 102633 (2023).

    PubMed 

    Google Scholar
     

  • Ramesh, V., Chi, N. A. & Rajpurkar, P. Improving radiology report generation systems by removing hallucinated references to non-existent priors. Proc. Mach. Learn. Res. 193, 456–473 (2022). This study introduces a novel method that uses LLMs in report generation to rewrite generated reports, which subsequent studies have built on.


    Google Scholar
     

  • Ranjit, M., Ganapathy, G., Manuel, R. & Ganu, T. Retrieval augmented chest X-ray report generation using Openai GPT models. Proc. Mach Learn. Res. 219, 650–666 (2023).


    Google Scholar
     

  • Tu, T. et al. Towards generalist biomedical AI. NEJM AI 1, AIoa2300138 (2024). This study is a prototypical example of a new generalist medical AI model that uses foundation models to expand the capabilities for report generation.

  • Moor, M. et al. Med-flamingo: a multimodal medical few-shot learner. Proc. Mach Learn. Res. 225, 353–367 (2023).

    MATH 

    Google Scholar
     

  • Zhao, Z. et al. ChatCAD+: toward a universal and reliable interactive CAD using LLMs. IEEE Trans. Med. Imaging 43, 3755–3766 (2024).

  • Lin, B. et al. Towards medical artificial general intelligence via knowledge-enhanced multimodal pretraining. Preprint at https://doi.org/10.48550/arXiv.2304.14204 (2023).

  • Lee, S., Kim, W. J., Chang, J. & Ye, J. C. LLM-CXR: instruction-finetuned LLM for CXR image understanding and generation. In The Twelfth International Conference on Learning Representations (ICLR, 2024).

  • Xu, S. et al. ELIXR: Towards a general purpose X-ray artificial intelligence system through alignment of large language models and radiology vision encoders. Preprint at https://doi.org/10.48550/arXiv.2308.01317 (2023).

  • Jeong, J. et al. Multimodal image-text matching improves retrieval-based chest x-ray report generation. In Medical Imaging with Deep Learning 978–990 (PMLR, 2024). This study introduces concepts that inspired our AI resident paradigm, including testing models in depth clinically prior to implementation.

  • Wu, C., Zhang, X., Zhang, Y., Wang, Y. & Xie W. Towards generalist foundation model for radiology. Preprint at https://doi.org/10.48550/arXiv.2308.02463 (2023). This study proposes the RadBench benchmark, which is a good example of a measure designed specifically for foundation models in radiology.

  • Wu, C. et al. Can GPT-4V(ision) serve medical applications? Case studies on GPT-4V for multimodal medical diagnosis. Preprint at https://doi.org/10.48550/arXiv.2310.09909 (2023).

  • Senkaiahliyan, S. et al. GPT-4V(ision) unsuitable for clinical care and education: a clinician-evaluated assessment. Preprint at medRxiv https://doi.org/10.1101/2023.11.15.23298575 (2023).

  • Han, T. et al. Comparative analysis of GPT-4Vision, GPT-4 and open source LLMs in clinical diagnostic accuracy: a benchmark against human expertise. Preprint at medRxiv https://doi.org/10.1101/2023.11.03.23297957 (2023).

  • Bannur, S. et al. Learning to exploit temporal structure for biomedical vision–language processing. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition 15016–15027 (Computer Vision Foundation, 2023).

  • Zhang, K. et al. Multi-task paired masking with alignment modeling for medical vision-language pre-training. IEEE Trans. Multimedia 26, 4706–4721 (2023).

  • Chen, Z., Diao, S., Wang, B., Li, G. & Wan, X. Towards unifying medical vision-and-language pre-training via soft prompts. In Proc. IEEE/CVF International Conference on Computer Vision 23403–23413 (Computer Vision Foundation, 2023).

  • Guo, Z. et al. Evaluating large language models: a comprehensive survey. Preprint at https://doi.org/10.48550/arXiv.2310.19736 (2023).

  • Raji, I. D. & Buolamwini, J. Actionable auditing: investigating the impact of publicly naming biased performance results of commercial ai products. In Proc. 2019 AAAI/ACM Conference on AI, Ethics, and Society (eds Conitzer, V. et al.) 429–435 (Association for Computing Machinery, 2019).

  • Rastogi, C., Tulio Ribeiro, M., King, N., Nori, H. & Amershi S. Supporting human–AI collaboration in auditing LLMs with LLMs. In Proc. 2023 AAAI/ACM Conference on AI, Ethics, and Society (eds Rossi, F. et al.) 913–926 (Association for Computing Machinery, 2023).

  • Nori, H., King, N., McKinney, S. M., Carignan, D. & Horvitz, E. Capabilities of GPT-4 on medical challenge problems. Preprint at https://doi.org/10.48550/arXiv.2303.13375 (2023). GPT-4V is one of the most impactful recently introduced VLMs, and this study evaluates its medical capabilities.

  • Liu, J. et al. Qilin-Med-VL: towards Chinese large vision–language model for general healthcare. Preprint at https://doi.org/10.48550/arXiv.2310.17956 (2023).

  • Rajpurkar, P., Chen, E., Banerjee, O. & Topol, E. J. AI in health and medicine. Nat. Med. 28, 31–38 (2022).

    CAS 
    PubMed 

    Google Scholar
     

  • Yan, B. et al. Style-aware radiology report generation with RadGraph and few-shot prompting. Empir. Method Nat. Lang. Process. https://doi.org/10.18653/v1/2023.findings-emnlp.977 (2023).

  • Chen, Q. et al. Act like a radiologist: radiology report generation across anatomical regions. In Proceedings of the Asian Conference on Computer Vision 36–52 (Association for Computing Machinery, 2024).

  • Nicolson, A., Dowling, J., Anderson, D. & Koopman, B. Longitudinal data and a semantic similarity reward for chest X-ray report generation. Inform. Med. Unlocked 50, 101585 (2024).

  • Hyland S. L. et al. MAIRA-1: a specialised large multimodal model for radiology report generation. Preprint at https://doi.org/10.48550/arXiv.2311.13668 (2023).

  • Hou, W., Cheng, Y., Xu, K., Li, W. & Liu, J. RECAP: towards precise radiology report generation via dynamic disease progression reasoning. Empir. Method Nat. Lang. Process. https://doi.org/10.18653/v1/2023.findings-emnlp.140 (2023).

  • Shang, C. et al. MATNet: exploiting multi-modal features for radiology report generation. IEEE Signal Process. Lett. 29, 2692–2696 (2022).

    ADS 
    MATH 

    Google Scholar
     

  • Moor, M. et al. Foundation models for generalist medical artificial intelligence. Nature 616, 259–265 (2023). This review introduces generalist medical AI, which we extend to medical report generation.

    ADS 
    CAS 
    PubMed 
    MATH 

    Google Scholar
     

  • Gemini Team. Gemini: A Family of Highly Capable Multimodal Models (Google DeepMind, 2023). Gemini is another very impactful multimodal foundation model that has great potential within medical report generation.

  • Yue, X. et al. Mmmu: a massive multi-discipline multimodal understanding and reasoning benchmark for expert AGI. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 9556–9567 (Computer Vision Foundation, 2024).

  • Ni, M. et al. M3p: learning universal representations via multitask multilingual multimodal pre-training. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition 2021 3977–3986 (Computer Vision Foundation, 2021).

  • Nagrani, A. et al. Attention bottlenecks for multimodal fusion. Adv. Neural Inf. Process Syst. 34, 14200–14213 (2021).

    MATH 

    Google Scholar
     

  • Chen, Y. J. et al. Representative image feature extraction via contrastive learning pretraining for chest X-ray report generation. Preprint at https://doi.org/10.48550/arXiv.2209.01604 (2022).

  • Shu, C. et al. MITER: medical image–text joint adaptive pretraining with multi-level contrastive learning. Expert Syst. Appl. 238, 121526 (2024).


    Google Scholar
     

  • Tanida, T., Müller, P., Kaissis, G. & Rueckert, D. Interactive and explainable region-guided radiology report generation. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition 7433–7442 (Computer Vision Foundation, 2023). This is one of the few report generation studies that has explicitly explored multimodal outputs as a possibility in improving the interpretability of generated reports.

  • Lewis, P. et al. Retrieval-augmented generation for knowledge-intensive nlp tasks. Adv. Neural Inf. Process Syst. 33, 9459–9474 (2020).

    MATH 

    Google Scholar
     

  • Yang, S. et al. Radiology report generation with a learned knowledge base and multi-modal alignment. Med. Image Anal. 86, 102798 (2023).

    PubMed 
    MATH 

    Google Scholar
     

  • Chen, Z., Shen, Y., Song, Y. & Wan, X. Cross-modal memory networks for radiology report generation. Preprint at https://doi.org/10.48550/arXiv.2204.13258 (2022).

  • Li, M. et al. Dynamic graph enhanced contrastive learning for chest X-ray report generation. In Proc IEEE/CVF Conference on Computer Vision and Pattern Recognition 3334–3343 (Computer Vision Foundation, 2023).

  • Huang, Z., Zhang, X. & Zhang, S. KiUT: knowledge-injected U-transformer for radiology report generation. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition 19809–19818 (Computer Vision Foundation, 2023).

  • Zhang, K. et al. Semi-supervised medical report generation via graph-guided hybrid feature consistency. IEEE Trans. Multimed. 26, 904–915 (2024).

    MATH 

    Google Scholar
     

  • Hou, W., Xu, K., Cheng, Y., Li, W. & Liu, J. ORGAN: Observation-Guided Radiology Report Generation via Tree Reasoning. In Proc. 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (eds Rogers, A., Boyd-Graber, J. & Okazaki, N.) 8108–8122 (ACL, 2023).

  • Wang, Y., Lin, Z. & Dong, H. Rethinking medical report generation: disease revealing enhancement with knowledge graph. In Proceedings of the 3rd Workshop on Interpretable Machine Learning in Healthcare (IMLH) at ICML (ICML, 2023).

  • Yan, S. et al. Attributed abnormality graph embedding for clinically accurate X-ray report generation. IEEE Trans. Med. Imaging 42, 2211–2222 (2023).

    PubMed 
    MATH 

    Google Scholar
     

  • Kale, K., Bhattacharyya, P., Gune, M., Shetty, A. & Lawyer, R. KGVL-BART: knowledge graph augmented visual language BART for radiology report generation. In Proc. 17th Conference of the European Chapter of the Association for Computational Linguistics (eds Vlachos, A. & Augenstein, I.) 3393–3403 (Association for Computational Linguistics, 2023).

  • Cao, Y. et al. MMTN: multi-modal memory transformer network for image-report consistent medical report generation. In Proc. AAAI Conference on Artificial Intelligence Vol. 37 (eds Williams, B., Chen, Y. & Neville, J.) 277–285 (Association for Computing Machinery, 2023).

  • Wang, L. et al. An inclusive task-aware framework for radiology report generation. In International Conference on Medical Image Computing and Computer-Assisted Intervention (eds Wang, L. et al.) 568–577 (Springer, 2022).

  • Scarselli, F., Gori, M., Tsoi, A. C., Hagenbuchner, M. & Monfardini, G. The graph neural network model. IEEE Trans. Neural Netw. 20, 61–80 (2008).

    PubMed 
    MATH 

    Google Scholar
     

  • Johnson, A. E. et al. MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports. Sci. Data 6, 317 (2019). MIMIC-CXR is one of the largest and most commonly used datasets to develop medical report generation models, and there is a need for more such datasets across different specialties and image modalities.

    PubMed 
    PubMed Central 
    MATH 

    Google Scholar
     

  • Jain, S. et al. RadGraph: extracting clinical entities and relations from radiology reports. In Advances in Neural Information Processing Systems, Datasets and Benchmarks Track Vol. 35 (NeurIPS, 2021). Radgraph is one of the most popular knowledge graphs used to incorporate external knowledge and boost clinical accuracy for radiology report generation models.

  • Yang, S., Wu, X., Ge, S., Zhou, S. K. & Xiao, L. Knowledge matters: chest radiology report generation with general and specific knowledge. Med. Image Anal. 80, 102510 (2022).

    PubMed 

    Google Scholar
     

  • Kale K. et al. “Knowledge is power”: constructing knowledge graph of abdominal organs and using them for automatic radiology report generation. In Proc. 61st Annual Meeting of the Association for Computational Linguistics (Volume 5: Industry Track) (eds Sitaram, S. et al.) 11–24 (Association for Computational Linguistics, 2023).

  • Zhang, J. et al. A novel deep learning model for medical report generation by inter-intra information calibration. IEEE J. Biomed. Health Inform. 27, 5110–5121 (2023).

    PubMed 
    MATH 

    Google Scholar
     

  • Moon, J. H., Lee, H., Shin, W., Kim, Y. H. & Choi, E. Multi-modal understanding and generation for medical images and text via vision-language pre-training. IEEE J. Biomed. Health Inform. 26, 6070–6080 (2022).

    PubMed 
    MATH 

    Google Scholar
     

  • Zhu, Q. et al. Utilizing longitudinal chest x-rays and reports to pre-fill radiology reports. In International Conference on Medical Image Computing and Computer-Assisted Intervention 189–198 (Springer, 2023).

  • Kale, K., Bhattacharyya, P. & Jadhav, K. Replace and report: NLP assisted radiology report generation. In Findings of the Association for Computational Linguistics (eds Rogers, A., Boyd-Graber, J. & Okazaki, N.) 10731–10742 (ACL, 2023).

  • Li, J., Li, S., Hu, Y. & Tao, H. A self-guided framework for radiology report generation. In International Conference on Medical Image Computing and Computer-Assisted Intervention (eds Wang, L. et al.) 588–598 (Springer, 2022).

  • Xu, D. et al. Vision–knowledge fusion model for multi-domain medical report generation. Inf. Fusion 97, 101817 (2023).

    MATH 

    Google Scholar
     

  • Kaur, N. & Mittal, A. CheXPrune: sparse chest X-ray report generation model using multi-attention and one-shot global pruning. J. Ambient Intell. Humaniz. Comput. 14, 7485–7497 (2023).

    PubMed 
    MATH 

    Google Scholar
     

  • You, J., Li, D., Okumura, M. & Suzuki, K. JPG–Jointly learn to align: automated disease prediction and radiology report generation. In Proc. 29th International Conference on Computational Linguistics (eds Calzolari, N. et al.) 5989–6001 (International Committee on Computational Linguistics, 2022).

  • Saini, T., Ajad, A. & Kumar, N. Deep ensemble architecture for knee osteoarthritis severity prediction and report generation. In 2023 5th International Conference on Recent Advances in Information Technology 1–6 (IEEE, 2023).

  • Jia, X. et al. Few-shot radiology report generation via knowledge transfer and multi-modal alignment. In 2022 IEEE International Conference on Bioinformatics and Biomedicine 1574–1579 (IEEE, 2022).

  • Sun, J., Wei, D., Wang, L. & Zheng, Y. Lesion guided explainable few weak-shot medical report generation. In International Conference on Medical Image Computing and Computer-Assisted Intervention (eds Wang, L. et al.) 615–625 (Springer, 2022).

  • Shi, J., Wang, S., Wang, R. & Ma, S. AIMNet: adaptive image-tag merging network for automatic medical report generation. In IEEE International Conference on Acoustics, Speech and Signal Processing 7737–7741 (IEEE, 2022).

  • Huang, Z., Bianchi, F., Yuksekgonul, M., Montine, T. J. & Zou, J. A visual–language foundation model for pathology image analysis using medical twitter. Nat. Med. 29, 2307–2316 (2023).

  • Sun, Y. et al. Pathasst: a generative foundation ai assistant towards artificial general intelligence of pathology. In Proceedings of the AAAI Conference on Artificial Intelligence Vol. 38 5034–5042 (AAAI, 2024).

  • Zhou, J. et al. Pre-trained multimodal large language model enhances dermatological diagnosis using SkinGPT-4. Nat. Commun. 15, 5649 (2024).

  • Lu, M. Y. et al. A multimodal generative AI copilot for human pathology. Nature 634, 466–473 (2024).

  • Lin, W. et al. Pmc-clip: contrastive language-image pre-training using biomedical documents. In International Conference on Medical Image Computing and Computer-Assisted Intervention 525–536 (Springer, 2023).

  • Huisman, M., Joye, S. & Biltereyst, D. Searching for health: Doctor Google and the shifting dynamics of the middle-aged and older adult patient–physician relationship and interaction. J. Aging Health 32, 998–1007 (2020).

    PubMed 

    Google Scholar
     

  • Van Riel, N., Auwerx, K., Debbaut, P., Van Hees, S. & Schoenmakers, B. The effect of Dr Google on doctor–patient encounters in primary care: a quantitative, observational, cross-sectional study. BJGP Open 1, bjgpopen17X100833 (2017).

    PubMed 
    PubMed Central 

    Google Scholar
     

  • Stewart, M. A. What is a successful doctor-patient interview? A study of interactions and outcomes. Soc. Sci. Med. 19, 167–175 (1984).

    CAS 
    PubMed 
    MATH 

    Google Scholar
     

  • Street, R. L. Jr, Makoul, G., Arora, N. K. & Epstein, R. M. How does communication heal? Pathways linking clinician–patient communication to health outcomes. Patient Educ. Couns. 74, 295–301 (2009).

    PubMed 

    Google Scholar
     

  • Ende, J. Feedback in clinical medical education. JAMA 250, 777–781 (1983).

    CAS 
    PubMed 
    MATH 

    Google Scholar
     

  • Hewson, M. G. & Little, M. L. Giving feedback in medical education: verification of recommended techniques. J. Gen. Intern. Med. 13, 111–116 (1998).

    CAS 
    PubMed 
    PubMed Central 
    MATH 

    Google Scholar
     

  • Fischetti, C. et al. The evolving importance of artificial intelligence and radiology in medical trainee education. Acad. Radiol. 29, S70–S75 (2022).

    PubMed 
    MATH 

    Google Scholar
     

  • Clusmann, J. et al. The future landscape of large language models in medicine. Commun. Med. 3, 141 (2023).

    PubMed 
    PubMed Central 

    Google Scholar
     

  • Tanno, R., Barrett, D.G.T., Sellergren, A. et al. Collaboration between clinicians and vision–language models in radiology report generation. Nat. Med. 31, 599–608 (2025).

  • Christiano, P. F. et al. Deep reinforcement learning from human preferences. In Advances in Neural Information Processing Systems Vol. 30 (eds Guyon, I. et al.) (Curran Associates, 2017).

  • Rafailov, R. et al. Direct preference optimization: your language model is secretly a reward model. In Proc. 37th International Conference on Neural Information Processing Systems (eds Oh, A. et al.) 53728–53741 (Association for Computing Machinery, 2024).

  • Pellegrini, C., Özsoy, E., Busam, B., Navab, N. & Keicher, M. RaDialog: a large vision–language model for radiology report generation and conversational assistance. Preprint at https://doi.org/10.48550/arXiv.2311.18681 (2023).

  • De Grave, A. J., Cai, Z. R., Janizek, J. D., Daneshjou, R. & Lee, S.-I. Auditing the inference processes of medical-image classifiers by leveraging generative AI and the expertise of physicians. Nat. Biomed. Eng. https://doi.org/10.1038/s41551-023-01160-9 (2023).

  • Li, M., Liu, R., Wang, F., Chang, X. & Liang, X. Auxiliary signal-guided knowledge encoder–decoder for medical report generation. World Wide Web 26, 253–270 (2023).

    PubMed 
    MATH 

    Google Scholar
     

  • Tang, Y., Yang, H., Zhang, L. & Yuan, Y. Work like a doctor: unifying scan localizer and dynamic generator for automated computed tomography report generation. Expert Syst. Appl. 237, 121442 (2024).


    Google Scholar
     

  • Voutharoja, B. P., Wang, L. & Zhou, L. Automatic radiology report generation by learning with increasingly hard negatives. In ECAI 2023 2427–2434 (IOS Press, 2023).

  • Papineni, K., Roukos, S., Ward, T. & Zhu, W. J. BLEU: a method for automatic evaluation of machine translation. In Proc. 40th Annual Meeting of the Association for Computational Linguistics (ed. Isabel, P.) 311–318 (Association for Computing Machinery, 2002).

  • Banerjee, S. & Lavie, A. METEOR: an automatic metric for MT evaluation with improved correlation with human judgments. In Proc. ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization (eds Callison-Burch, C. et al.) 65–72 (Association for Computing Machinery, 2005).

  • Vedantam, R., Lawrence Zitnick, C. & Parikh, D. CIDEr: consensus-based image description evaluation. In Proc. IEEE Conference on Computer Vision and Pattern Recognition 4566–4575 (IEEE, 2015).

  • Lin, C. Y. Rouge: A package for automatic evaluation of summaries. In Proc. Workshop on Text Summarization Branches Out 74–81 (Association for Computational Linguistics, 2004).

  • Chaves, J. M. Z. et al. RaLEs: a benchmark for radiology language evaluations. In 37th Conference on Neural Information Processing Systems Datasets and Benchmarks Track (eds Oh, A. et al.) (Association for Computing Machinery, 2023).

  • Yu, F. et al. Evaluating progress in automatic chest X-ray radiology report generation. Patterns. 4, 100802 (2023).

    PubMed 
    PubMed Central 

    Google Scholar
     

  • Huang, J. et al. Generative artificial intelligence for chest radiograph interpretation in the emergency department. JAMA Netw. Open 6, e2336100–e2336100 (2023).

    PubMed 
    PubMed Central 

    Google Scholar
     

  • Chen, L., Zaharia, M. & Zou, J. How is ChatGPT’s behavior changing over time? Harv. Data Sci. Rev. https://doi.org/10.1162/99608f92.5317da47 (2024).

  • Tu, S. et al. ChatLog: recording and analyzing ChatGPT across time. Preprint at https://doi.org/10.48550/arXiv.2304.14106 (2023).

  • Shakarian, P., Koyyalamudi, A., Ngu, N. & Mareedu, L. An independent evaluation of ChatGPT on mathematical word problems (MWP). In Proceedings of the AAAI 2023 Spring Symposium on Challenges Requiring the Combination of Machine Learning and Knowledge Engineering (AAAI, 2023).

  • Finlayson, S. G. et al. Adversarial attacks on medical machine learning. Science 363, 1287–1289 (2019).

    ADS 
    CAS 
    PubMed 
    PubMed Central 
    MATH 

    Google Scholar
     

  • Zou, A., Wang, Z., Kolter, J. Z. & Fredrikson, M. Universal and transferable adversarial attacks on aligned language models. Preprint at https://doi.org/10.48550/arXiv.2307.15043 (2023).

  • Xu, X., Kong, K., Liu, N., Cui, L., Wang, D., Zhang, J. & Kankanhalli, M. An LLM can fool itself: a prompt-based adversarial attack. In Proc. of the Twelfth International Conference on Learning Representations (ICLR, 2024).

  • Challen, R. et al. Artificial intelligence, bias and clinical safety. BMJ Qual. Saf. 28, 231–237 (2019).

    PubMed 
    PubMed Central 

    Google Scholar
     

  • Parasuraman, R. & Manzey, D. H. Complacency and bias in human use of automation: an attentional integration. Hum. Factors 52, 381–410 (2010).

    PubMed 
    MATH 

    Google Scholar
     

  • Saenz, A., Chen, E., Marklund, H. & Rajpurkar, P. The MAIDA initiative: establishing a framework for global medical-imaging data sharing. Lancet Digit. Health 6, e6–e8 (2024).

    CAS 
    PubMed 

    Google Scholar
     

  • Seyyed-Kalantari, L., Zhang, H., McDermott, M. B., Chen, I. Y. & Ghassemi, M. Underdiagnosis bias of artificial intelligence algorithms applied to chest radiographs in under-served patient populations. Nat. Med. 27, 2176–2182 (2021).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Jussupow, E., Spohrer, K., Heinzl, A. & Gawlitza, J. Augmenting medical diagnosis decisions? An investigation into physicians’ decision-making process with artificial intelligence. Inf. Syst. Res. 32, 713–735 (2021).


    Google Scholar
     

  • Kempt, H., Heilinger, J. C. & Nagel, S. K. “I’m afraid I can’t let you do that, Doctor”: meaningful disagreements with AI in medical contexts. AI Soc. 38, 1407–1414 (2023).


    Google Scholar
     

  • Montemayor, C., Halpern, J. & Fairweather, A. In principle obstacles for empathic AI: why we can’t replace human empathy in healthcare. AI Soc. 37, 1353–1359 (2022).

    PubMed 

    Google Scholar
     

  • Mittermaier, M., Raza, M. M. & Kvedar, J. C. Bias in AI-based models for medical applications: challenges and mitigation strategies. Npj Digit. Med. 6, 113 (2023).

    PubMed 
    PubMed Central 

    Google Scholar
     

  • Roselli, D., Matthews, J. & Talagala, N. Managing bias in AI. In Companion Proc. 2019 World Wide Web Conference (eds Liu, L. & White, R.) 539–544 (Association for Computing Machinery, 2019).

  • Tang, Y., Tang, Y., Zhu, Y., Xiao, J. & Summers, R. M. A disentangled generative model for disease decomposition in chest X-rays via normal image synthesis. Med. Image Anal. 67, 101839 (2021).

    PubMed 
    MATH 

    Google Scholar
     

  • Liu, C., Shah, A., Bai, W. & Arcucci, R. Utilizing synthetic data for medical vision-language pre-training: bypassing the need for real images. Preprint at https://doi.org/10.48550/arXiv.2310.07027 (2023).

  • Bridge, P., Fielding, A., Rowntree, P. & Pullar, A. Intraobserver variability: should we worry? J. Med. Imag. Rad. Sci. 47, 217–220 (2016).


    Google Scholar
     

  • Karimi, D., Dou, H., Warfield, S. K. & Gholipour, A. Deep learning with noisy labels: exploring techniques and remedies in medical image analysis. Med. Image Anal. 65, 101759 (2020).

    PubMed 
    PubMed Central 

    Google Scholar
     

  • Mahapatra, D., Bozorgtabar, B. & Ge, Z. Medical image classification using generalized zero shot learning. In Proc. IEEE/CVF International Conference on Computer Vision (eds Berg, T. et al.) 3344–3353 (IEEE, 2021).

  • Xian, Y., Schiele, B. & Akata, Z. Zero-shot learning—the good, the bad and the ugly. In Proc. IEEE Conference on Computer Vision and Pattern Recognition 4582–4591 (IEEE, 2017).

  • Wang, Z., Zhou, L., Wang, L. & Li, X. A self-boosting framework for automated radiographic report generation. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (eds Berg, T. et al.) 2433–2442 (IEEE, 2021).

  • Shi, Y., Ji, J., Zhang, X., Qu, L. & Liu Y. Granularity matters: pathological graph-driven cross-modal alignment for brain CT report generation. In Proc. 2023 Conference on Empirical Methods in Natural Language Processing (eds Bouamor, H. et al.) 6617–6630 (Association for Computational Linguistics, 2023).

  • Liu, C. F. et al. Automatic comprehensive radiological reports for clinical acute stroke MRIs. Commun. Med. 3, 95 (2023).

    CAS 
    PubMed 
    PubMed Central 
    MATH 

    Google Scholar
     

  • Han, Z. et al. Unifying neural learning and symbolic reasoning for spinal medical report generation. Med. Image Anal. 67, 101872 (2021).

    PubMed 
    MATH 

    Google Scholar
     

  • Han, Z., Wei, B., Leung, S., Chung, J. & Li, S. Towards automatic report generation in spine radiology using weakly supervised framework. In Medical Image Computing and Computer Assisted Intervention 2018: 21st International Conference (eds Frangi, A. F. et al.) 185–193 (Springer, 2018).

  • Lei, J. et al. Unibrain: universal brain MRI diagnosis with hierarchical knowledge-enhanced pre-training. Preprint at https://doi.org/10.48550/arXiv.2309.06828 (2023). One of the few studies that has explored text generation based on 3D medical images, especially MRI scans, which proposed an expansive dataset consisting of MRI and text pairings.

  • Wu, F. et al. AGNet: automatic generation network for skin imaging reports. Comput. Biol. Med. 141, 105037 (2022).

    PubMed 
    MATH 

    Google Scholar
     

  • Li, M. et al. Cross-modal clinical graph transformer for ophthalmic report generation. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition 20656–20665 (IEEE, 2022).

  • Huang, J. H. et al. DeepOpht: medical report generation for retinal images via deep models and visual explanation. In Proc. IEEE/CVF Winter Conference on Applications of Computer Vision 2442–2452 (IEEE, 2021).

  • Li, M. et al. FFA-IR: towards an explainable and reliable medical report generation benchmark. In 35th Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2) (Association for Computing Machinery, 2021).

  • Topol, E. Why doctors should organize. The New Yorker https://www.newyorker.com/culture/annals-of-inquiry/why-doctors-should-organize (5 August 2019).

  • RELATED ARTICLES

    Most Popular

    Recent Comments