The Role of Pause Features in Automated Voice Disorder Detection


Background: Machine learning shows promise as an objective tool for voice disorder detection. Previous findings showed that speech samples retaining pauses performed better in voice disorder detection than those with pauses removed. Building on this observation, we investigated whether explicitly measuring temporal patterns through pause-based features enhances machine learning performance in detecting voice disorders.

Methods: Retrospective voice samples were analyzed from 200 participants, consisting of 148 individuals with dysphonia and 52 control participants. The feature set included traditional acoustic metrics (perturbation, noise, cepstral, and spectral analyses) and four pause-based measures: pauses per minute, average pause duration, pause duration standard deviation, and pause rate. Connected speech samples were processed using MATLAB and Praat to extract all features. Support Vector Machine (SVM) models were trained and evaluated on the combined feature set.

Results: The addition of pause-based features to existing acoustic metrics improved classification accuracy from 0.79 to 0.84 in distinguishing between dysphonic and non-dysphonic voices in connected speech. These temporal patterns provided complementary information to traditional acoustic measures.

Conclusions: This study advances voice disorder detection by demonstrating the value of pause-based measures in connected speech analysis. The findings suggest that temporal patterns in speech, particularly pause characteristics, contain relevant information for voice disorder detection. This advancement paves the way for more sophisticated objective assessment methods in voice research and clinical applications.

Mark
Ahmed
Adrián
Juiliana
Adam
Eric
Berardi
Yousef
Castillo-Allendes
Codino
Rubin
Hunter