Please use this searchable database to view abstract information from our 53rd Annual Symposium in 2024

Abstract Title

Evaluating the Performance of Voice Extraction and Fundamental Frequency Detection in Baroque Music

Abstract

To achieve accurate and insightful scientific voice analysis, it is essential to have pristine laboratory data, demanding high-quality, noise-free recordings, without any background instruments. This typically restricts the participant pool to local individuals, excluding famous singers and/or deceased iconic artists and role models. It would thus be beneficial to conduct voice analysis on commercial recordings, which are often convoluted with orchestral accompaniment. In order to utilize such recordings for scientific analysis, it is required to faithfully isolate the voice signal. That voice signal could then be used to extract basic voice features such as the fundamental frequency (fo). However, while various voice isolation techniques exist, their performance in the context of fo extraction has not yet been rigorously assessed.

Addressing this, we systematically tested the performance of three voice isolation methods with respect to fo extraction. Seven commercial recordings of “Ombra mai fu” by G.F. Händel with different orchestrations (piano, harpsichord, and strings) were investigated. The orchestral intro of that musical piece was mixed with a synthesized singing voice, harmonically matching the respective chords in the intro. The voice was synthesized with a linear source-filter approach for the vowels /u/, /ɔ/, and /ʌ/. The synthesized voice was added to the orchestral intro at different signal-to-noise ratios, resulting in 105 sound samples. Four voice conditions for fo extraction were tested: Izotope RX10, Moises.ai, robust principal component analysis (RPCA; Nestorova et al., 2023), and a “baseline” scenario without voice isolation. For each of these four scenarios, fo was estimated with Praat’s autocorrelation method in two ways: “as is”, and after additional bandpass filtering (2200 to 6000 Hz). The resulting data were compared to the fo known from the synthesized data.

Pending further investigation, preliminary findings suggest that SNR levels have a significant impact on fo extraction quality, with additional effects from the extraction method, orchestral accompaniment type, and synthesis vowel. Bandpass filtering – removing most of the acoustic energy of the accompaniment but retaining the harmonic structure of the singing voice – considerably improved the fo extraction performance.

Results cannot be generalized to other music samples. Caution is necessary when utilizing voice isolation methods to facilitate the analysis of vocal acoustic parameters. Although it is possible to correct errors in the analysis of the extracted fo, it’s important to carefully consider the impact of SNR and musical accompaniment on the final output. Future research of these methods should incorporate data related to harmonics and formants.

First NameTiago
Last NameCruz
Author #2 First NamePedro
Author #2 Last NameAndrade
Author #3 First NameManuel
Author #3 Last NameBrandner
Author #4 First NameChristian
Author #4 Last NameHerbst