A Study on the Acoustic Characterization of Soprano and Tenor in Dual Language(Italian vs Chinese) Singing
Objective:In vocal performance, the interconnection between music and language is inseparable. However, current vocal acoustics research predominantly focuses on the acoustic characteristics of the voice itself,these studies have yet to address the acoustic manifestations of vocal performance in multilingual contexts. Yet, multilingual singing is remarkably common in practical performance settings, prompting this study to concentrate on the acoustic features of two voice types (soprano, tenor) when performing in two languages (Italian, Chinese). Through perturbation analysis, formant analysis, and long-term average spectrum analysis(LTAS), we reveal the vocal characteristics of these voice types in Italian and Chinese singing, offering data and theoretical support for vocal pedagogy and practice.
Method:This study recruited 10 undergraduate voice major students (5 sopranos and 5 tenors). Each soprano performed both the Italian and Chinese versions of the same classical song, while each tenor also sang both language versions of the same piece, yielding 20 valid samples in total. All recordings were annotated and relevant singing materials extracted using Praat; statistical analyses were conducted in a Python environment. Each dataset was quantified through multiple voice-quality indices based on fundamental frequency (F0), formant, and sung vowels. The study compared the acoustic characteristics of the two voice type when singing in Italian versus Chinese through perturbation analysis (Jitter & Shimmer), formant comparison, and long-term average spectrum (LTAS) analysis.
For this experiment, recordings were made with a D-Command integrated digital console paired with Avid Pro Tools software. The singing signal was captured by a condenser microphone (Neumann U87) positioned approximately 15 cm from the participant’s lips. The audio signal was routed through the D-Command console into the computer and simultaneously recorded in Pro Tools at a sampling rate of 48 kHz. The entire data-collection process took place in the recording studio of the Art School, **** University, with a room noise floor of approximately 20 dB.
Conclusion:The results show that, in the stable segment of vowel /a/, the mean Jitter and Shimmer for Italian were both lower than for Chinese. Paired t-tests revealed no significant difference for Jitter, while Shimmer was significantly lower in Italian (p = 0.048, d = 0.58). Formant data showed that, at the same fundamental frequency, Italian vowels had higher F1/F2 energy than Chinese ones. Sopranos displayed an upward-shifted F2 system, and both voice types followed a more posterior and open resonant path in Italian, demonstrating that language itself measurably affects vocal-tract shape and energy distribution.Additionally, regardless of voice type, the LTAS consistently displayed the singer’s formant in the 2.5–3.5 kHz region, confirming that professional vocal training yields cross-language consistency in vocal-tract morphology and resonance frequencies. LTAS results indicate that language only fine-tunes local spectral shape without altering the core singing-resonance feature—the singer’s formant.