Assessing the quality of ChatGPT’s responses to questions related to radiofrequency ablation for varicose veins

Document Type

Article

Department

Surgery; General Surgery; Vascular Surgery

Abstract

Objective: This study aimed to evaluate the accuracy and reproducibility of information provided by ChatGPT, in response to frequently asked questions (FAQs) about radiofrequency ablation (RFA) for varicose veins.
Methods: This cross-sectional study was conducted at The Aga Khan University Hospital, Karachi, Pakistan. A set of 18 FAQs regarding RFA for varicose veins were compiled from credible online sources and presented to ChatGPT twice, separately, using the 'new chat' option. Twelve experienced vascular surgeons (with over 2 years of experience and at least 20 RFA procedures performed annually) independently evaluated the accuracy of the responses using a 4-point Likert scale and assessed their reproducibility.
Results: Most evaluators were males (n=10/12, 83.3%) with an average of 12.3 ± 6.2 years of experience as a vascular surgeon. Six (50%) evaluators were from the UK followed by three (25.0%) from Saudi Arabia, two (16.7%) from Pakistan, and one (8.3%) from the USA. Among the 216 accuracy grades, most of the evaluators graded the responses as 'comprehensive' (n=87/216, 40.3%) or 'accurate but insufficient' (n=70/216, 32.4%), whereas only 17.1% (n=37/216) were graded as 'a mixture of both accurate and inaccurate information' and 10.8% (n=22/216) as 'entirely inaccurate'. Overall, 89.8% (n=194/216) of the responses were deemed reproducible. Of the total responses, 70.4% (n=152/216) were classified as 'good quality' and 'reproducible'. The remaining responses were 'poor quality' with 19.4% (n=42/216) 'reproducible' and 10.2% (n=22/216) 'non-reproducible'. There was non-significant inter-rater disagreement among the vascular surgeons for overall responses (Fleiss' Kappa: -0.028, p=0.131).
Conclusion: ChatGPT provided generally accurate and reproducible information on RFA for varicose veins, however, variability in response quality and limited inter-rater reliability highlight the need for further improvements. While it has the potential to enhance patient education and support healthcare decision-making, improvements in its training, validation, transparency, and mechanisms to address inaccurate or incomplete information are essential.

Comments

Volume, issue and pagination is not provided by the author/publisher.

Publication (Name of Journal)

Journal of Vascular Surgery: Venous and Lymphatic Disorders

DOI

doi.org/10.1016/j.jvsv.2024.101985

Share

COinS