A Deep Learning-Based Vortical-Viscoelastic Model of Sustained Phonation: A Subject-Specific Paradigm


Objective: Most computational phonation modeling work assume uniform or idealized glottal airflow as the inlet flow condition. However, the realistic inlet airflow is inherently time-dependent, subject-specific, and phonation-dependent. Therefore, the objective of this study is to develop a subject-specific model of sustained phonation with a time-varying inlet condition. The temporal variation of the glottal area is extracted from laryngeal high-speed videoendoscopy (HSV) data and used to prescribe the vocal fold boundary motion. To more accurately capture supraglottal flow behavior and vortical flow signatures, the model also incorporates the supraglottal vocal tract geometry.
Method: A rigid HSV system was used to obtain the data (at 16000 fps) from a 49-year-old normophonic male subject for the sustained phonation of /i/ at habitual pitch and loudness. A deep learning-based segmentation was applied to the HSV data to extract the glottal area waveform (GAW). Simultaneous electroglottography (EGG) data were used to incorporate the vertical phase differences in vocal folds vibration into the glottal motion. This motion was then applied as a moving boundary condition in a two-dimensional finite element model of subglottal and supraglottal airflow. The model solves the incompressible Navier–Stokes equations using a velocity–vorticity formulation (v2f) for turbulence and compares the resulting aerodynamic flow structures with those obtained using a constant subglottal pressure.
Results: Simulations with the realistic time-varying glottal input yielded a glottal flow waveform with pronounced skewness than that observed with constant subglottal pressure. The supraglottal flow field also shows a different vortex shedding pattern. These differences indicate that simplified inlet conditions may fail to capture important features of the glottal jet behavior and vortical structures.
Conclusion: The proposed computational framework provides a patient-specific tool that can be adapted to individual voices for more accurate prediction of phonatory flow dynamics. By incorporating the subject specific HSV and EGG measurements into the simulation, the framework models a realistic vocal fold vibration and its resulting airflow and enables more accurate differentiation between normophonic and disordered phonatory aerodynamic patterns, potentially assisting in patient-specific voice diagnosis and treatment planning.
Acknowledgment: We acknowledge the support from NIH NIDCD R21DC020003, K01DC017751, and R01DC019402.

Maruf Md
Maryam
Dimitar D.
Mohsen
Ikram
Naghibolhosseini
Deliyski
Zayernouri