Automated Classification of Glottal Closure Configurations in Sustained Phonation Using Laryngeal High-Speed Videoendoscopy
Objective: The glottal closure configuration of the vocal folds plays a critical role in effective voice production. However, due to factors such as variations in phonatory tasks, vocal intensity, pitch, and individual anatomical or physiological differences, complete closure of the vocal folds is not always observed. Additionally, voice disorders that alter the structure or function of the laryngeal mechanisms can further disrupt normal glottal closure, resulting in irregular or incomplete closure patterns and reduced vocal efficiency. This paper develops an automated approach for the classification of glottal closure configurations from laryngeal high-speed videoendoscopy (HSV) data in sustained phonation, providing objective measures that may ultimately support improved diagnosis and characterization of voice disorders.
Methods: HSV data and audio recordings were obtained simultaneously from 14 vocally-normal speakers and 14 participants with voice disorders during recording of the /i/ vowel at habitual pitch and loudness. A deep learning framework based on a convolutional neural network (CNN) architecture was designed and implemented to automatically classify glottal closure configurations from HSV images. Statistical analyses were then performed to compare classification outcomes and glottal closure patterns between vocally normal and disordered groups.
Results and Conclusions: The automated approach was successfully developed and demonstrated accurate classification of glottal closure configurations from HSV data. The model effectively identified and categorized distinct glottal closure patterns across phonatory behaviors, providing a reliable framework for objective assessment of vocal fold closure characteristics. This work investigates the types of glottal closure and their variability among vocally normal individuals and those with voice disorders. Understanding these patterns is essential for accurately evaluating vocal closure in clinical voice assessment.
Acknowledgments: We acknowledge the support from NIH NIDCD R21DC020003, K01DC017751, and R01DC019402.