Assessing diagnostic performance of multimodal LLMs and a custom convolutional neural network in tooth-level caries detection and localization

Document Type

Article

Department

Dental-oral, Maxillo-facial Surgery

Abstract

Background: Artificial Intelligence is reshaping dental diagnostics through automated interpretation of images. While Convolutional Neural Networks (CNNs) demonstrate high accuracy via domain-specific training, multimodal Large Language Models (LLMs) such as ChatGPT-4o and Gemini 2.5 Flash have recently acquired visual-reasoning capabilities without task-specific fine-tuning.
Objective: This study compared the diagnostic performance of these LLMs with a custom CNN for detecting and localizing dental caries on intraoral images.
Methods: This cross-sectional diagnostic accuracy study used 22 occlusal-view intraoral images. ChatGPT-4o, Gemini 2.5 Flash, and a YOLOv5s-based CNN analyzed each image for caries detection and localization. Quantitative evaluation assessed decay detection using accuracy, sensitivity, specificity, precision, Positive Predictive Value (PPV), Negative Predictive Value (NPV), and F1 score. Inter-model differences were analyzed using McNemar's test. Additionally, a descriptive qualitative evaluation was performed by specialist dentists, who rated each model's output for realism, diagnostic accuracy, bounding-box precision, and absence of unnecessary annotations using a 3-point Likert scale.
Results: The CNN achieved the highest diagnostic accuracy (97.2%), sensitivity (86.7%), and F-1 score (88.0%). Gemini 2.5 Flash outperformed ChatGPT-4o in sensitivity (76.4 vs. 66.2%) and F-1 score (74.3 vs. 68.7%). Overall, CNN's performance was significantly superior (p < 0.001), whereas no difference was found between the two LLMs (p = 0.541). Qualitatively, CNN scored best for realism (90.9%), decay accuracy (79.5%), and bounding-box precision (93.1%).
Conclusion: CNNs provide superior accuracy for caries localization compared with multimodal LLMs. However, LLMs demonstrate potential for generating clinically interpretable diagnostic summaries. Hybrid systems integrating CNN-based detection with LLM-driven reasoning may enhance decision-making and improve efficiency in dental diagnostic workflows.

Comments

Volume, issue, and pagination are not provided by the author/publihser.

Publication (Name of Journal)

BMC Oral Health

DOI

10.1186/s12903-026-08590-2

Recommended Citation

Khalid, A., Nooruddin, A., Naveed, N., Ahmed, S. F., Adnan, N., Suresh, S., Lal, A., Umer, F. (2026). Assessing diagnostic performance of multimodal LLMs and a custom convolutional neural network in tooth-level caries detection and localization. BMC Oral Health.
Available at: https://ecommons.aku.edu/pakistan_fhs_mc_surg_dent_oral_maxillofac/306

eCommons@AKU

Section of Dental-Oral Maxillofacial Surgery

Assessing diagnostic performance of multimodal LLMs and a custom convolutional neural network in tooth-level caries detection and localization

Document Type

Department

Abstract

Comments

Publication (Name of Journal)

DOI

Recommended Citation

Search

Browse

Links

eCommons@AKU

Section of Dental-Oral Maxillofacial Surgery

Assessing diagnostic performance of multimodal LLMs and a custom convolutional neural network in tooth-level caries detection and localization

Authors

Document Type

Department

Abstract

Comments

Publication (Name of Journal)

DOI

Recommended Citation

Share

Search

Browse

Links