CepstralVox Open-Source Tool for Cepstral Voice Analysis


Introduction
Cepstral Peak Prominence (CPP) and its smoothed variant (CPPS) are widely recognized as robust acoustic markers of voice quality, correlating strongly with auditory-perceptual ratings of dysphonia and breathiness. However, clinical implementation is often limited by the technical demands of existing analysis platforms, variability in preprocessing, and differences across software packages. These factors create barriers to reproducible and accessible cepstral analysis, particularly in connected speech, where preprocessing steps such as voice activity detection and pause handling are not standardized. To address this gap, we developed CepstralVox, an open-source application designed to provide transparent, reproducible, and clinician-friendly CPP/CPPS extraction.
Methods
CepstralVox is implemented in Python with a graphical interface that integrates dynamic spectrogram visualization, region-of-interest selection, and single-file or batch processing. The software automatically converts stereo audio to mono and applies optional preprocessing for connected speech, including pitch-based voiced region extraction and pause removal. Cepstral computation is performed via Praat, ensuring consistency with established cepstral literature. To validate analytic equivalence, CPPS was measured in 294 recordings from the Perceptual Voice Qualities Database using (1) Praat and (2) CepstralVox, under matched parameter settings for both sustained vowels and connected speech. Agreement was evaluated using Pearson correlation, Lin’s concordance coefficient (CCC), ordinary least squares regression, and Bland–Altman analysis.
Results
CepstralVox demonstrated near-perfect agreement with Praat across both sustained vowels and connected speech. For sustained vowels (n = 294), CPPS values showed a mean difference of −0.0056 dB, narrow 95% limits of agreement (−0.0291 to 0.0179 dB), Pearson r = 0.999997, and CCC = 0.999995. For connected speech (n = 294), mean difference was −0.0017 dB, with similarly narrow limits of agreement, Pearson r = 0.999991, and CCC = 0.999990. Regression slopes approximated 1.00, with minimal intercepts, indicating measurement equivalence and confirming that preprocessing and interface automation do not alter analytical results.
Conclusions
CepstralVox reproduces Praat-derived CPPS values with negligible bias while offering a streamlined, reproducible workflow suitable for clinical assessment and large-scale research. By integrating transparent preprocessing, parameter control, real-time visualization, and batch automation, the software addresses a key barrier to the broader adoption of cepstral metrics in clinical voice evaluation. Its open-source implementation and cross-platform availability support both clinical translation and methodological reproducibility, enabling standardized use of CPP/CPPS across varied populations and speech tasks.

Tiago
Cruz