Speech-to-Noise Ratio and Voice-to-Noise Ratio of Some Voice Databases with Implications for Acoustic Voice Analysis
Objectives. This study aimed to examine the Speech-to-Noise Ratio (SNR) and Voice-to-Noise Ratio (VNR) in some currently available voice databases to clarify whether they meet the recording quality requirements for use in acoustic voice analysis.
Methods. This was a cross-sectional study that extracted the prolonged vowel /_/, short phrases, and connected speech in 977 vocally healthy and voice-disordered speakers from six pre-existing voice and speech databases, including AVFAD, MEEI, PVQD, SVD, Uncommon Voice, and USVAC. These vocal tasks were extracted from randomly selected study IDs and were used to measure the SNR and VNR using a Praat script. SNR and VNR were described using descriptive statistics and compared across databases using multivariate analysis of variance. Vocal task effects and group effects within each database were calculated.
Results. There was great variability in SNR and VNR across databases with wide ranges of values from very low to high voice recording quality. In databases with measurable SNR and VNR from multiple tasks, except Uncommon Voice database, there were statistically significant effects of tasks and groups on both SNR and VNR in AVFAD, USVAC, and PVQD. In general, the vowel /_/ and phrases had higher SNR and VNR than connected speech. In databases with single measurable task, group effects were statistically significant for MEEI and not significant for SVD.
Conclusions. These databases provided voice samples with variable recording quality, which may impact on the accuracy of voice disorder discrimination using acoustic analysis or machine learning/AI algorithms. The decreased values of these measures in connected speech tasks in these databases implied the influence of linguistic factors on signal quality. The differences in the two measures between healthy and disordered groups in several databases indicate the need to standardize tasks, recording equipment and settings, recording environment, and data collection personnel training to achieve consistent signal quality.