The Effects of Voice Isolation Algorithms on Acoustic Measures in Recordings of CCM Singing


Introduction: Audiostrip and other artificial intelligence algorithms are increasingly being used to separate singing from instrumental accompaniments in recorded music. This separation has numerous applications in voice science and pedagogy; however, the effects of these algorithms on commonly used acoustic metrics have not been examined. This study examines the effect of artificial intelligence algorithms, such as Audiostrip, on cepstral peak prominence (CPP), correlation dimension (D2), and voice type component profile (VTCP) values that quantify the periodicity and aperiodicity of the singing voice. The aim is to determine the validity of these measures on audio recordings of voice that have been post-processed.

Methods: Audio recordings of CCM singing, free of background accompaniment and post-processing effects, were obtained from public databases. These recordings were mixed and post-processed with accompaniment tracks according to industry standards by a music producer with [X] years of professional experience. A Grammy award-winning CCM producer, blind to study aims, reviewed the produced recordings for ecological validity.

Professionally mixed and produced audio tracks of CCM singing were attached to placeholder video content and uploaded to YouTube. Recordings were subsequently downloaded, and the algorithms were used (e.g. Audiostrip, PopPop AI Vocal Remover, Vocal Remover) to separate the singing from the background accompaniment. Audio from both pre-production voice-only audio recordings, as well as those processed and downloaded (as described above), were analyzed acoustically with CPP, D2, VTCP, and other acoustic metrics. Appropriate statistical analyses were performed to compare measures of processed recordings to their unprocessed counterparts.
Anticipated Results: It is expected that algorithmic isolation will alter some spectral and nonlinear parameters, producing minor reductions in CPP and modest increases in D2 relative to unprocessed recordings. VTCP classifications are anticipated to remain stable for primarily periodic phonation but may show increased irregularity labeling in recordings with dense accompaniment.

Conclusion: This study will clarify the acoustic consequences of AI-based voice isolation in CCM singing and evaluate the validity of applying CPP, D2, and VTCP to post-processed recordings. Results will inform the use of online and commercially distributed material in voice science research, supporting broader access to ecologically valid vocal data while identifying algorithmic limitations for precise quantitative analysis.

David
Matthew
Grayson J.
Owen P.
Jakob
Matthew R.
Jack J.
Meyer
Edwards
Bienhold
Wischhoff
Holm
Hoffman,
Jiang