Co-Designing Stimulus for Phonetic and Clinical Evaluation of Transgender Voice Therapy
Increased F0 is linked to voice being perceived as more feminine and is often the target of gender affirming voice care. Acoustic vowel characteristics are targeted less frequently, despite gender differences in vowel production. For example, American and Australian English (AusE) female speakers exhibit higher vowel formants relative to male speakers, particularly in front vowels. AusE female speakers also produce longer vowel duration.
Using vowel characteristics as valid outcome measures is thwarted by speech data collected in therapy not being optimised for phonetic vowel analysis. Semi-spontaneous or read speech preferred in clinical studies may not contain enough vowel tokens in phonetically balanced positions and can be confounded by a range of factors, such as coarticulation or dialectal variation. The /hVd/ words – the set of words in which vowels are produced in the /h_d/ context, e.g., heed, hid, head – are ideal for phonetic analysis but have low ecological validity. To remedy this issue, we co-designed a set of clinically valid sentences that capture the AusE vowel inventory in phonetically balanced contexts.
The AusE vowel inventory consists of 18 stressed vowels and schwa. For every stressed vowel, three target words were created placing the vowel in a monosyllabic word flanked by obstruent consonants to control coarticulatory variation (e.g., /æ/ – Jack, bat, bag). The three words were embedded in a sentence such that one target word was sentence-initial, one medial, and one final (e.g., /æ/ – Jack saw a bat, and put it in his bag). For unstressed schwa, five disyllabic words were generated and embedded in a sentence. The resulting set contains 18 (stressed vowels) x 3 (target words) = 44 stressed vowels and five unstressed vowels in phonetically balanced positions in 19 sentences.
One cis male and one cis female speaker of AusE read the sentences out loud. F1 and F2 values were estimated at visually identified vowel targets; F0 contours were extracted from the entire sentence. Statistical analysis was not conducted due to small sample size.
Based on visual inspection, F0, F1, and F2 were comparable to normative values of AusE, showing expected gender-differences. Female speech was characterized by higher and more varied F0, higher F1 and F2, particularly for front vowels, and longer vowel duration.
The results demonstrate that clinically and phonetically valid acoustic vowel measures were captured using our sentence set, with robust gender differences motivating the small sample size prior to data collection from a vulnerable population. Future research will include data collection from gender-diverse population and automation of formant analysis.