Vibrato Machine Learning AI Models as Diagnostic Biomarkers for Vocal Health, Dysphonia, TMD, and Tremor


Introduction/Background: Vibrato—a multifactorial feature of singing voice function and expression—holds significant potential for assessing vocal health beyond its traditional aesthetic role. While often studied as an expressive device, recent research suggests vibrato may serve as a biomarker for neuromuscular dysfunction, biomechanical inefficiency, and motor control disorders. This study evaluates machine learning models for analyzing vibrato characteristics and their associations with biomechanical efficiency, muscle tension, and motor control abnormalities. By integrating advanced vibrato metrics, time-varying acoustic features, and physiological data, we aim to enhance diagnostic precision and bridge the gap between subjective assessment and objective, quantifiable voice analysis.

Methods: Multi-signal data were collected from 35 singers previously diagnosed with primary muscle tension dysphonia (pMTD) or temporomandibular disorder (TMD) through a collaborative voice care team at the McGill University Health Centre. Acoustic data originated from two prior vibrato studies recorded in identical environments to ensure reliability. Fifty expert judges—singing teachers and clinical voice specialists—rated vibrato samples for health, biomechanical efficiency, expressiveness, and regularity on a five-point Likert scale. Perceptual ratings were compared with acoustic measures, including Acoustic Voice Quality Index (AVQI), Cepstral Peak Prominence Smoothed (CPPS), and vibrato variability time-profiles (Nestorova, 2025). Physiological measures—motion capture (MOCAP) and surface electromyography (sEMG)—quantified jaw kinematics and muscle activation in the TMD subset. A convolutional neural network (CNN) baseline was optimized using Optuna-based hyperparameter tuning, varying convolutional layers (1–3), kernel size (2–4), hidden units (100–200), learning rate (10−5–10−1), and dropout (0.01–0.1).

Results: Greater vibrato variability correlated significantly with increased muscle tension, particularly among singers with painful TMD. Confusion matrix analysis across training (N=600), validation (N=75), and test (N=76) datasets revealed distinct diagnostic separability: Healthy Controls achieved 83–90% classification accuracy, pMTD 67–69%, and painful TMD 23–30%. The CNN reached 100% accuracy on synthetic and 97–98% on real audio data, effectively differentiating healthy vibrato from dysfunction-associated irregularities, though performance varied by condition.

Conclusions: Vibrato-based AI analysis shows strong diagnostic promise. The relationship between stability, muscle tension, and neuromotor control highlights vibrato as a sensitive marker for pMTD, TMD, and emerging potential as a vocal tremor biomarker. While model precision was high, broader datasets and standardized protocols are needed for clinical and pedagogical translation. Refinement of feature extraction may establish vibrato as a key biomarker linking vocal artistry with neuromuscular health.

Theodora
Felicia
Jiarui
Chonghui
Yaoyao Fiona
Luc
Nestorova
Nyanyo
Xie
Zhang
Zhao
Mongeau