Assessing diagnostic performance of multimodal LLMs and a custom convolutional neural network in tooth-level caries detection and localization
Document Type
Article
Department
Dental-oral, Maxillo-facial Surgery
Abstract
Background: Artificial Intelligence is reshaping dental diagnostics through automated interpretation of images. While Convolutional Neural Networks (CNNs) demonstrate high accuracy via domain-specific training, multimodal Large Language Models (LLMs) such as ChatGPT-4o and Gemini 2.5 Flash have recently acquired visual-reasoning capabilities without task-specific fine-tuning.
Objective: This study compared the diagnostic performance of these LLMs with a custom CNN for detecting and localizing dental caries on intraoral images.
Methods: This cross-sectional diagnostic accuracy study used 22 occlusal-view intraoral images. ChatGPT-4o, Gemini 2.5 Flash, and a YOLOv5s-based CNN analyzed each image for caries detection and localization. Quantitative evaluation assessed decay detection using accuracy, sensitivity, specificity, precision, Positive Predictive Value (PPV), Negative Predictive Value (NPV), and F1 score. Inter-model differences were analyzed using McNemar's test. Additionally, a descriptive qualitative evaluation was performed by specialist dentists, who rated each model's output for realism, diagnostic accuracy, bounding-box precision, and absence of unnecessary annotations using a 3-point Likert scale.
Results: The CNN achieved the highest diagnostic accuracy (97.2%), sensitivity (86.7%), and F-1 score (88.0%). Gemini 2.5 Flash outperformed ChatGPT-4o in sensitivity (76.4 vs. 66.2%) and F-1 score (74.3 vs. 68.7%). Overall, CNN's performance was significantly superior (p < 0.001), whereas no difference was found between the two LLMs (p = 0.541). Qualitatively, CNN scored best for realism (90.9%), decay accuracy (79.5%), and bounding-box precision (93.1%).
Conclusion: CNNs provide superior accuracy for caries localization compared with multimodal LLMs. However, LLMs demonstrate potential for generating clinically interpretable diagnostic summaries. Hybrid systems integrating CNN-based detection with LLM-driven reasoning may enhance decision-making and improve efficiency in dental diagnostic workflows.
Publication (Name of Journal)
BMC Oral Health
DOI
10.1186/s12903-026-08590-2
Recommended Citation
Khalid, A.,
Nooruddin, A.,
Naveed, N.,
Ahmed, S. F.,
Adnan, N.,
Suresh, S.,
Lal, A.,
Umer, F.
(2026). Assessing diagnostic performance of multimodal LLMs and a custom convolutional neural network in tooth-level caries detection and localization. BMC Oral Health.
Available at:
https://ecommons.aku.edu/pakistan_fhs_mc_surg_dent_oral_maxillofac/306
Comments
Volume, issue, and pagination are not provided by the author/publihser.