Application of Artificial Intelligence and Its Integration with Voice Science in Vocal Fold Injury Prediction and Recovery Due to Thyroidectomy: A Narrative Review
Objective:
Artificial intelligence (AI) has recently been noted in diagnosing, anticipating illness dissemination, and optimizing treatment protocols. Given the significance of interdisciplinary collaborations across the three fields of AI, surgery, and voice science, this research aimed to determine the application of AI and its integration with voice science in vocal fold injury prediction and recovery due to thyroidectomy.
Methods:
Using PubMed, WOS, and Scopus, a review of studies between 2000 and 2024 reporting on the application of AI in thyroid surgery concerning recurrent laryngeal nerve was conducted. Keywords were as follows: thyroid surgery, artificial intelligence, recurrent laryngeal nerve (RLN) injury, vocal fold (VF) injury, voice, and their synonyms.
Results:
The results revealed two AI applications. First, AI offers models for preventing VF injury in two ways: 1) predicting which patients are at high risk for VF injury. Two models were proposed: (a) Super Learner algorithm, attaining 0.628 AUROC, (b) SVM model with RBF kernel with oversampling, which attained AUROC = 1.00 and 100% accuracy. 2) recognizing the RLN during surgery using a cropping and segmentation module that achieved a 0.707 Dice similarity coefficient in appropriate imaging settings. Second, AI models can assess the likelihood of vocal recovery after thyroidectomy-induced VF paralysis by using EfficientNet, a CNN-based architecture, followed by LSTM units to process spectrograms of patients' voice samples to predict GRBAS scores three months post-operation. There was a significant statistical correlation between actual and predicted values for breathiness, grade, and asthenia.
Conclusion:
VF damage prediction seems possible, and the function of AI strongly depends on appropriate clinical inputs like perioperative lab data, particularly those affecting phonation, demographic diversity, and a sufficient sample size. Models with insufficient clinical inputs or short sample sizes showed a poorer function. To develop a model to identify the RLN, proper lighting, appropriate distance for image capturing, and multiple pictures during surgery are critical. The voice quality is a mandatory variable in predicting vocal recovery. Since the voice has a multidimensional nature, using voice acoustic features and findings from stroboscopy before and after surgery can be effective in designing more accurate models for predicting vocal recovery.