2024 Symposium Abstracts - THE VOICE FOUNDATION

Please use this searchable database to view abstract information from our 53rd Annual Symposium in 2024

Abstract Title	Optimization Refined Biomechanical Parameter Estimation of Vocal Fold Models using Neural Networks
Abstract	High-speed video endoscopy enables speech language pathologists and physicians to precisely inspect the vocal folds of their patients, with the organ’s motion being recorded at several thousand images per second. While such recordings ease the diagnosis of visually identifiable pathologies, neither insights into local tissue properties are obtained, nor into the subglottal pressure, which is a critical key component for efficient phonation, can be estimated. To supplement recordings with biomechanical replacement quantities for inaccessible tissue properties, mass-spring-damper systems (i.e. lumped mass models) have been fitted to the recorded vocal fold trajectories. The underlying mathematical problem statement is commonly understood as highly non-convex optimization of the differential equations’ key quantities like masses, stiffnesses, and subglottal pressure. Unwanted convergence to local minima and the large number of model evaluations per recording are known weaknesses of such approaches. To encounter those shortcomings, we have trained a Convolutional Recurrent Neural Network (CRNN) on the estimation of the parameters of a Six-Mass Model (6MM). Being trained on the 6MM, which is longitudinally more fine-grained than two-mass models, our CRNN allowed for fast and improved subglottal pressure predictions at the cost of less accurate fundamental frequency matching of the predicted trajectories. By extending this approach by a light-weight optimization, while using the neural network’s prediction as starting point, we seek to additionally improve the frequency replication of the 6MM. On a test dataset, consisting of a total of 288 recordings obtained from several ex-vivo porcine larynx in different phonatory configurations, almost 80% correlation between the measured ground-truth pressure values and the neural network’s predictions were achieved at a relative error of 12.8%. With the additional optimization, we reduce the neural network’s 9.7% error in the fundamental frequency of the trajectories, by exploiting convergence to nearby local minima. Detailed results on parameter estimation improvements will be presented. Applying such models and parameter estimation approaches yield further information on vocal fold biomechanics and therefore may in future be included in clinical voice assessment.
First Name	Jonas
Last Name	Donhauser
Author #2 First Name	Bogac
Author #2 Last Name	Tur
Author #3 First Name	Anne
Author #3 Last Name	Schützenberger
Author #4 First Name	Michael
Author #4 Last Name	Döllinger