Artificial Intelligence Chatbots Answers’ Accuracy on Professional Voice Patients Education
Introduction/Objective: Artificial intelligence (AI) is everywhere. It is a fact that every health professional needs to know how to deal, how to judge and how to instruct patients on how to use AI tools. Patients often use chatbots that use large language models to generate answers to daily questions. Some use these tools to answer health questions as well. Regarding professional voice users, little is known how accurate the answers given by chatbots are. The goal of this study was to rate answers given by chatbots on daily questions common to this specific type of patients on different levels of criteria.
Methods: Three different AI Chatbots were used (Gemini, Claude and ChatGPT). Eleven questions were developed based on the knowledge of the authors considering daily concerns on professional voice. The answer to each question was recorded, and one file was generated for each model. Two raters (one fellowship trained laryngologist and one speech language pathologist, both with more than 20 years of experience on professional voice care) blindly rated the 11 answers based on precision, detail, safety and adequacy to patient knowledge. The ratings varied from 1-5 Linkert scale, 5 being the highest and better score. Descriptive and statistical analysis were performed.
Results: The mean overall score regarding the three chatbots was 3.5 (SD 1.32), between neutral and agree. Claude had the highest score compared to ChatGPT and Gemini, with a mean overall score of 4.5 (SD 1), ChatGPT had the lowest score, with a mean score of 2.6 (neutral and disagree) (SD 1.1). The highest scores from all models regarding the criteria were related to precision (mean 3.7, SD 1.20) and how precise the answer was (mean 3.7, SD 1.4). Kruskal-wallis test proved that the difference between the three platforms was robust (p=0.0001) and the difference between each pair of platforms was significant as well (p>0.0001).
Conclusion: Among the three chatbots analyzed, Claude was considered the best model when precision, detail, safety and adequacy to patients’ knowledge. It is important to guide patients regarding questions and how and when to use AI to guide them through health questions.