Please use this searchable database to view abstract information from our 53rd Annual Symposium in 2024

Abstract Title

Machine Learning-Based Model for Automatic Prediction of Listeners’ Attitudes towards Dysphonic Voices

Abstract

Method: We used a database of connected speech samples (CAPE-V sentences) produced from 44 subjects of both genders, with different overall severity (OS) of vocal deviation (health, mild, moderate and intense) and different degrees of roughness (GR), breathiness (GB) and strain (GS). The samples were presented to 152 listeners of both genders who performed the judgment of 12 attitudes inserted in a semantic differential scale previously validated for this study. The results of the evaluation of the 12 attributes were used to categorize the valence of the judgment as positive, neutral or negative. From the six concatenated CAPE-V sentences, 25 acoustic measurements were extracted, including fundamental frequency (fo) measurements, traditional perturbation and noise measurements, and cepstral/spectral measurements. The database was partitioned into a training set and a test set. We used the upsampling method to balance the data and the k-fold cross validation method, repeating the validation procedure 30 times with subsets of 11 samples. We tested 10 ML classifiers and used the measures of accuracy, sensitivity, specificity, and Kappa to evaluate the performance of the classifier.
Results: The Kernel Support Vector Machine (Kerner SVM) and Naive Bayes (NB) models obtained the best performances. The Kerner SVM selected four acoustic measures, including minimum and maximum fo, smoothed cepstral peak prominence (CPPS) and shimmer. The Kerner SVM showed an accuracy of 0.92, with values of 1.0 and 0.80 for sensitivity and specificity, respectively, as well as a Kappa of 1.0. NB performed similarly to Kerner SVM. However, the NB selected 23 acoustic measurements and was excluded from the analysis because it was considered a not parsimonious model.
Conclusion: Among the tested models, the Kernel SVM model performed best in predicting the judgment of listeners’ attitudes towards dysphonic voices, based on a set of four acoustic measure.

First NameLeonardo
Last NameLopes
Author #2 First NameDeyverson
Author #2 Last NameEvangelista
Author #3 First NameSamuel
Author #3 Last NameAbreu
Author #4 First NameMarcelo
Author #4 Last NameFerreira