Rescaling the Voice: Innovations in Standardized Perceptual Assessment
Background/Rationale: Ideal clinical outcome measures for voice quality would be characterized by strong scientific validity, and result in measurements that are highly precise, repeatable, and with known mathematical properties that allow intuitive use and interpretation. While valid, current clinical practice of using the Consensus Auditory-Perceptual Evaluation of Voice (CAPE-V), often suffers from poor reliability within and across raters and yields limited protocol adherence. Furthermore, the ordinal properties of CAPE-V are ill-suited for quantification of outcomes. One approach has developed dimension-specific scales for clinical use with standard physical units similar to the Sone loudness scale. More recently, three-dimensional (breathiness, roughness, strain) scales were developed to capture and account for natural covariance across these dimensions.
Objective: To evaluate the reliability and validity of a novel three-dimensional matching task (QualEVox3D) designed for clinical use.
Methods: Sustained /a/ phonations from 30 children with glottal vibratory source (GVS) and 30 children with supraglottal vibratory source (SGVS) were used in perceptual testing. In the QualEVox3D matching task, listeners (7 adults; 4 females, 3 males; mean age 27 years) matched dysphonic stimuli to synthetic comparisons by simultaneously adjusting signal-to-noise ratio (breathiness), amplitude modulation depth (roughness), and bandpass filter gain (strain) of a standard comparison signal. A three-dimensional magnitude estimation (3DME) task was also completed and both data sets were compared to CAPE-V ratings from three experts.
Results: Using intra-class correlation coefficients [ICC (2,k)], initial results indicate strong intra- (r = 0.83 for roughness; 0.78 for breathiness; 0.72 for strain) and inter- (r = 0.95 for roughness; 0.93 for breathiness; 0.84 for strain) rater reliability for QualEVox3D. The raters yielded similarly high reliability within (r > 0.74) and across (r > 0.88) themselves in the 3DME task. Correlations among tasks using all 60 samples resulted in high and significant results between all three perceptual judgments (QualEVox3D units, 3DME estimates, and CAPE-V scores) for all three voice qualities (r values ranging from 0.75 to 0.96), supporting validity of the novel task.
Conclusions: These results provide evidence for the empirical validity of using QualEVox3D for evaluation of pediatric dysphonic voices.