Can AI Detect Voice Disorders? A Systematic Review of Accuracy Outcomes


Objective: This systematic review aimed to identify and evaluate artificial intelligence (AI) approaches used to detect non-neurological voice disorders. Specifically, we focused on model accuracy in distinguishing pathological from healthy voice samples. A secondary objective was to illustrate how AI tools can be used to support the systematic review process itself.

Methods / Design: Studies were included if they implemented an AI-based method to detect primary voice disorders from healthy voice samples. Five databases were searched: PubMed/MEDLINE, Science Direct, Web of Science, EBSCO, and Scopus. Studies were screened and reviewed following PRISMA guidelines. Risk of bias was assessed using the Quality Assessment Tool for Diagnostic Accuracy Studies. Extracted data included AI model type, dataset source, and reported classification accuracy. We also tracked which AI techniques were most commonly used across studies.

Results: A total of 79 studies met inclusion criteria. These studies drew on a range of clinical and publicly available voice datasets and employed diverse AI classification models. The most common techniques were Support Vector Machines (n = 28) and Convolutional Neural Networks (n = 22). Across all studies, the mean classification accuracy for detecting voice disorders was 92%. Notably, 9 studies reported perfect accuracy (100%), and 32 reported accuracies between 95% and 99%. Despite high accuracies, many studies relied on internal validation, with few including external test sets.

Conclusions: AI models demonstrate strong potential for detecting non-neurological voice disorders with high accuracy. Support Vector Machines and Convolutional Neural Networks were frequently selected for- and effective in- voice disorder detection tasks. However, concerns remain regarding a hyperoptimization trend, overfitting and lack of generalizability due to limited datasets selected, and insufficient external validation. To translate these findings into clinical tools, future research should emphasize improved model transparency and rigorous validation using external data. Additionally, AI-supported review workflows can streamline systematic review processes and increase reproducibility.

Charles
Virginia
Pasquale
Nudelman
Tardini
Bottalico