Maximum Phonation Time as a Biomarker of Health: Pre-processing Methods and Preliminary Results for Vocal Fold Paralysis Using the Bridge2AI Dataset 


Background: Maximum phonation time (MPT) is a non-invasive acoustic measure that reflects glottal efficiency and is routinely evaluated in unilateral vocal fold paralysis (UVFP). Multiple studies report that patients with UVFP have reduced MPT compared to healthy controls. Automated trimming of MPT involves software to identify the onset and offset of phonation based on varying acoustic features, minimizing human error. However, there is limited research exploring which trimming method provides the most accurate MPT values, especially in the context of large dataset where manual annotation is too burdensome.

Objective: This study aims to find the most accurate method to preprocess and analyze MPT using the Bridge2AI-Voice dataset as a use case. The secondary objective is to analyze MPT values for vocal fold paralysis compared to controls in the Bridge2AI-Voice dataset, where controls have multiple other disorders.

Methods: Using the Bridge2AI-Voice dataset, MPT recordings of patients with and without UVFP were automatically trimmed using two commonly used methods: energy-based and Praat (pitch-based). When the difference between these methods was greater than 1 second, a fallback method of manual trimming was implemented. Traditional statistical analysis was used to compare post-trimming patient MPT times from the Bridge2AI-Voice dataset with and without UVFP. 

Results: 492 files from 164 patients (mean 59.41 years; range 19-90 years) were analyzed. Of the 492 files trimmed by Praat and energy-based trimmings, 118 (23.9%) had a difference > 1s. The Praat method tended to over-report the longer MPT in 79% (93) of the files, compared to 21% (25) by energy-based trimming. Average MPT was 12.23 ± 7.51s using the energy-based trimming, and 13.37 ± 7.31s by the Praat method. The average difference was 1.14 ± 2.84s, with a median of 0.03s. Statistical analysis using a pooled two-tailed t-test was performed (p-value = 0.012). There was a statistically significant difference for MPT values for patients with UVFP compared to patients without UVFP (17.32s).

Conclusion: Preliminary results suggest a statistically significant difference between MPT values reported using Praat and energy-based trimming. Understanding these differences is of utmost importance when analyzing large voice datasets where manual annotation is not possible.

Kirollos
Helena
Shrramana
Mohamed
Yael
Armosh
Beltran
Sudhakar,
Ebraheem
Bensoussan