Evaluating AI-Generated Speech for Intelligibility Analysis in Dysphonic and Non-Dysphonic Speakers


Objective:
This study explores the use of AI-generated speech to evaluate speech intelligibility, with a focus on comparing intelligibility between dysphonic and non-dysphonic speakers. Speech intelligibility is essential for understanding how listeners perceive spoken language, especially in individuals with voice disorders. By analyzing both dysphonic and normal voices, the study aims to assess whether AI-generated speech accurately reflects real voice recordings in intelligibility assessments, offering a scalable alternative for research in dysphonic speech intelligibility, a field often limited by the lack of quality dysphonic speech samples.

Methods:
The study sample included 12 speakers, evenly divided between normal and dysphonic voices, and balanced by sex. Each participant recorded one minute of spontaneous speech to train an AI model for speech synthesis. Additionally, each participant recorded a list from the Hearing in Noise Test (HINT), a standardized speech intelligibility test. Using these recordings, the AI model generated a second set of synthesized HINT speech lists. Thirty native American English speakers participated in an intelligibility test to evaluate both real and AI-generated samples. Their responses were used to compare the clarity and intelligibility of each recording type.

Results:
Results indicate that AI-generated speech closely matches real recordings in intelligibility. The average scores from the 30 listeners revealed no significant differences in intelligibility between AI-generated and real recordings for both dysphonic and non-dysphonic voices. Listeners displayed similar accuracy in understanding words from both types, suggesting that AI-generated speech effectively captures essential vocal characteristics. This was consistent for both dysphonic and normal voices, demonstrating that synthesized speech can retain unique qualities like breathiness or hoarseness crucial for intelligibility assessment.

Conclusions:
The findings support AI-generated speech as a reliable tool for intelligibility studies, especially in dysphonic populations. This approach offers a solution to the shortage of high-quality dysphonic speech samples, expanding research opportunities in speech and voice disorders. By enabling accurate, AI-based synthesis of dysphonic speech, this method facilitates broader investigations into speech intelligibility, potentially aiding clinicians and researchers in developing improved diagnostic and therapeutic tools for voice disorders.

Pasquale
Charles
Kate
Aron
Ryan
Keiko
Bottalico
Nudelman
Harty
Olivares
Anderson
Ishikawa