LMOD: A Large Multimodal Ophthalmology Dataset and Benchmark for Vision-Language Models

:partners: Authors

:yale: Zhenyue Qin - Yale University*

:ic_logo: Yu Yin - Imperial College London*

:anu-gold: Dylan Campbell - Australian National University

:uga_logo: Xuansheng Wu - University of Georgia

:nus_logo: Ke Zou - National University of Singapore

:nus_logo: Yih-Chung Tham - National University of Singapore

:uga_logo: Ninghao Liu - University of Georgia

:rmit_logo: Xiuzhen Zhang - RMIT University

:yale: Qingyu Chen - Yale University

:chinese-knot: Links

:paper: Paper

:database: Dataset

:poster: Poster

Poster.pdf

:presentation: Presentation

NAACL-2025-LMOD-Presentation.mp4

:quote: Citation

@inproceedings{qin2025lmod,
    title = "LMOD: A Large Multimodal Ophthalmology Dataset and Benchmark for Large Vision-Language Models",
    author = "Qin, Zhenyue  and
      Yin, Yu  and
      Campbell, Dylan  and
      Wu, Xuansheng  and
      Zou, Ke  and
      Liu, Ninghao  and
      Tham, Yih Chung  and
      Zhang, Xiuzhen  and
      Chen, Qingyu",
    editor = "Chiruzzo, Luis  and
      Ritter, Alan  and
      Wang, Lu",
    booktitle = "Findings of the Association for Computational Linguistics: NAACL 2025",
    month = apr,
    year = "2025",
    address = "Albuquerque, New Mexico",
    publisher = "Association for Computational Linguistics",
    url = "<https://aclanthology.org/2025.findings-naacl.135/>",
    pages = "2501--2522",
    ISBN = "979-8-89176-195-7",
    abstract = "The prevalence of vision-threatening eye diseases is a significant global burden, with many cases remaining undiagnosed or diagnosed too late for effective treatment. Large vision-language models (LVLMs) have the potential to assist in understanding anatomical information, diagnosing eye diseases, and drafting interpretations and follow-up plans, thereby reducing the burden on clinicians and improving access to eye care. However, limited benchmarks are available to assess LVLMs' performance in ophthalmology-specific applications. In this study, we introduce LMOD, a large-scale multimodal ophthalmology benchmark consisting of 21,993 instances across (1) five ophthalmic imaging modalities: optical coherence tomography, color fundus photographs, scanning laser ophthalmoscopy, lens photographs, and surgical scenes; (2) free-text, demographic, and disease biomarker information; and (3) primary ophthalmology-specific applications such as anatomical information understanding, disease diagnosis, and subgroup analysis. In addition, we benchmarked 13 state-of-the-art LVLM representatives from closed-source, open-source, and medical domains. The results demonstrate a significant performance drop for LVLMs in ophthalmology compared to other domains. Systematic error analysis further identified six major failure modes: misclassification, failure to abstain, inconsistent reasoning, hallucination, assertions without justification, and lack of domain-specific knowledge. In contrast, supervised neural networks specifically trained on these tasks as baselines demonstrated high accuracy. These findings underscore the pressing need for benchmarks in the development and validation of ophthalmology-specific LVLMs."
}

:medal-2: Acknowledgments

The icons are created by small.smiles, Freepik, Aranagraphics, Smashicons, heisenberg_jr, Vectors Market