Artificial Intelligence models for Laryngeal Dystonia classification


Objective

An important goal in vocal pathology research is to diagnose pathologies quickly, easily, reliably, and non-invasively. Current clinical diagnosis of Spasmodic Dysphonia / Laryngeal Dystonia (LD) relies on auditory-perceptual and videostroboscopic evaluations, which show low interrater reliability (up to 34%) among experts. Emerging artificial intelligence (AI) research for differentiating Adductor LD (ADLD) from other voice disorders has used sustained phonation samples or non-English language passages, thereby limiting robustness. To the best of our knowledge, there is only one paper in English using AI to differentiate ADLD from healthy speakers and other forms of dysphonia, in a small number subjects using sustained vowels and sentences. Our goal is to investigate the extent to which deep learning models can differentiate (a) ADLD from healthy speakers and (b) severity levels within ADLD across a wide range of speech samples.

Methods

We assembled audio data from 5 different sources: (1) Perceptual Voice Quality Database, (2) Uncommon Voice dataset, (3) ST American English Corpus (STAEC), (4) IU Vocal Physiology and Imaging Lab, and (5) MEEI database. We used sets (1) to (4) for training and validation, and (5) exclusively for testing. We trained two models, a Convolutional Neural Network (CNN) with a ResNet18 backbone, and a Recurrent Neural Network (RNN) with a Gated Recurrent Unit (GRU), to classify between healthy speech and ADLD across severity levels. We also experimented with several variants including adversarial training and supervised contrastive learning. For each model and architecture, we evaluated classification accuracy using different subsets of training data to investigate generalizability across different types of speech signals.

Results

Our preliminary results show that the CNN with ResNet 18 backbone with supervised contrastive learning works best among the models that we evaluated, with an area under the curve (AUC) score of 0.703 for the binary classification problem of differentiating ADLD from healthy speakers on sentence inputs. We will report detailed findings on how different model architectures and training datasets perform on detecting ADLD and will offer directions and recommendations for future work.

Hiroki
Weslie
David
Rita
Sato
Khoo
Crandall
Patel