Effects of Linguistic Context and Intonation on Vocal Gender Perception: Human and AI Comparisons Using Real and Nonsense Speech


Objective: This study examined how linguistic context and intonation patterns influence gender perception by comparing human and AI judgments of natural language and matched nonsense (“gibberish”) speech samples. The primary aim was to determine whether removing semantic and lexical information affects accuracy, confidence, and consistency in gender identification.
Methods:
Six adult listeners with normal hearing and one AI model participated in a perceptual experiment using speech samples from 12 speakers (4 transgender women, 4 cisgender women, 4 cisgender men). Each speaker produced spontaneous responses to set prompts, which were used to generate corresponding nonsense versions by reordering phonemes while maintaining prosodic patterns. Listeners classified each sample as male or female and rated their confidence on a 5-point scale. The AI model provided the probability of each sample being male or female based on acoustic analysis. Accuracy, confidence, and response time were compared between stimulus types using one-way ANOVAs. Pearson correlation analyses examined relationships among confidence, accuracy, and mean speaking pitch across stimuli.
Results: Findings revealed higher accuracy and confidence for meaningful language stimuli compared with nonsense speech. Removal of linguistic context increased response variability, and human participants frequently requested multiple listening trials, indicating greater perceptual ambiguity. Interestingly, AI classification probabilities also varied across the nonsense stimuli, though overall accuracy remained more stable than that of human listeners, suggesting reduced dependence on linguistic information. Confidence ratings were positively correlated with both identification accuracy and mean speaking pitch.
Conclusions: Linguistic content enhances gender perception in human listeners, likely by engaging higher-level cognitive and sociolinguistic processes that complement acoustic gender cues. When semantic information is removed, listeners depend more heavily on acoustic features, leading to reduced accuracy, confidence, and perceptual stability. Although AI judgments were generally more consistent than human performance, shifts in classification probabilities across nonsense stimuli suggest that linguistic structure may still influence acoustic-based modeling to some extent. These findings highlight the interactive role of linguistic and acoustic information in gender perception and provide insights relevant to gender-affirming voice assessment, speech synthesis, and perceptual training.

Ümit
Kyle
Leanne
Yael
Mark
Daşdöğen
Abramson
Goldberg
Bensoussan
Courey