Integrating Teeth from CT and Zero Echo Time MRI in Vocal Tract Models for Area Function Estimation
Introduction:
Magnetic Resonance Imaging (MRI) based vocal tract models offer a comprehensive way of studying vowel production, swallowing movements and study voice quality utilizing the tract shape and area functions of the volume1. Teeth form an integral part of these models altering the vocal tract shape. Conventional gradient echo MRI (GRE) sequences are widely used to generate vocal tract models which are not inclusive of teeth. Bony structures relevant in speech production exhibit short spin-spin relaxation (T2) time constants which causes the signal from these structures to rapidly decay after excitation. MRI sequences with echo-times greater than even 1ms are insensitive to image signal from bony structures. With zero echo time (ZTE) MRI techniques the gradient pulses are applied before the excitation pulses enabling imaging with “zero echo-time" and have demonstrated utility in imaging bony structures. This study aims at comparing vocal tract models generated using solely GRE-MRI with two hybrid models i.e. ZTE-MRI and CT-MRI that are teeth inclusive.
Methods:
3 professional voice users were recruited and scanned with upper-airway ultra-low dose CT and vocal tract MR imaging protocols in accordance with the University of Iowa’s IRB protocol (IRB # 202202556). ZTE scans are post processed2 to highlight bone structure of the upper airway pertinent to speech production. The vocal tract soft tissue is segmented from the up sampled GRE scan and combined with ZTE and CT derived bone structures to generate to hybrid vocal tract models.
A vocal tract area function was derived from each reconstructed vocal tract model by fitting a 40-point fixed airway centerline extending from just above the glottis to the lips in the midsagittal plane. An oblique slice perpendicular to each point along the centerline was computed and the cross-sectional area within it was measured. The collection of cross-sectional areas ordered as a function of distance from the glottis is the vocal tract area function for each reconstructed vocal tract shape (all /a/ vowels for the present study).
Results:
ZTE-MRI and CT-MRI vocal tract models underestimate cross sectional area of the tract near the lips as compared to solely GRE-MRI based models. This is not captured in a solely GRE-MRI based model, as this modality commonly fails to image teeth. We aim to utilize these vocal tract area functions to calculate frequency response functions which will be used to understand the acoustic consequences of teeth versus no-teeth conditions, and the CT-teeth versus ZTE-MRI teeth conditions.3
References:
1.Story, B. H. (2005). A parametric model of the vocal tract area function for vowel and consonant simulation. The Journal of the Acoustical Society of America, 117(5), 3231–3254.
2.Wiesinger, F., Sacolick, L. I., Menini, A., Kaushik, S. S., Ahn, S., Veit-Haibach, P., & Kaushik, S. S. (2016). Zero TE MR bone imaging in the head. Magnetic Resonance in Medicine, 75(1), 107–114.
3.Story, B. H. (2013). Phrase-level speech simulation with an airway modulation model. The Journal of the Acoustical Society of America, 134(4), 2873. https://doi.org/10.1121/1.4816288