MARVEL: MAchine Learning–based Classification FramewoRk for Voice Evaluation across MuLtiple Disorders.


Background: Patients suffering from neurological, respiratory, or other systemic diseases often experience voice and speech impairments requiring continuous monitoring. However, current clinical assessments are largely subjective and resource-intensive, limiting timely, data-driven intervention. Prior research [1] demonstrates that acoustic features such as fundamental frequency, jitter, shimmer, harmonics-to-noise ratio (HNR), and Cepstral Peak Prominence Smoothed (CPPS) can distinguish pathological from healthy voices. However, most studies are constrained by small datasets and limited scalability.
Objective: To develop and evaluate a scalable, interpretable, and data-driven framework for clinical voice assessment using machine learning models and Explainable AI.
Dataset: Dataset [2] contains 19,271 voice recordings from 442 participants across five clinical sites, voice disorders, neurological/neurodegenerative conditions, mood disorders, respiratory disorders and pediatric voice and speech disorders. Dataset includes spectrograms, MFCC, phenotype data, and so on.
Methods: This project proposes a machine learning–based classification framework for clinical voice assessment, leveraging acoustic, temporal, and longitudinal data to distinguish pathological from non-pathological voices and support clinical decision-making. The study explores two complementary approaches. First, a rule-based baseline model that enhances interpretability by computing z-scores of voice parameters relative to patient’s baseline, flagging deviations or deterioration trends. Second, a supervised non-linear learning model, such as RNN and LSTM, that enhances predictive accuracy by learning complex patterns across labeled datasets. The model integrates acoustic, temporal, and delta (baseline-difference) features to classify multiple pathology categories, including voice, neurological, and psychiatric disorders.
Expected Results: Preliminary analysis demonstrates that the proposed nonlinear framework effectively captures audio characteristics, with sanity checks and initial experiments supporting its feasibility. The research aims to lay the foundation for continuous, non-invasive voice-based health monitoring, offering potential to improve early detection, personalized intervention, and improved patient outcomes.
Conclusion: The framework advances the integration of voice biomarkers into telehealth and clinical decision support systems.
[1] Cantor-Cutiva LC, et al. (2023). Screening of Voice Pathologies: Identifying the Predictive Value of Voice Acoustic Parameters for Common Voice Pathologies. Journal of Voice (in press).
[2] Bensoussan Y, et al. (2025). Bridge2AI-Voice: An ethically-sourced, diverse voice dataset linked to health information. PhysioNet.

Monika
Prinsa
Sakshi
Ahmad
Shehenaz
Lady Catherine
Kendrea
Prajapati
Ghimire
Shrestha
Al Doulat
Shaik
Cantor-Cutiva
Todt