How the Brain Imagines the Voice: Insights from Neuroimaging Analyses


Vocal motor imagery—the mental simulation of voice production without overt movement—has long been employed in voice training and rehabilitation. It is thought to be enhanced when emotionally motivated, as emotion may help synchronize motor processes in ways that support achieving an intended vocal quality. However, imagery-based approaches have at times been linked to metaphorical concepts lacking empirical grounding, underscoring the need for more rigorous investigation.
This study examined (a) the neural activity patterns underlying vocal motor imagery, (b) whether these patterns are modulated by emotion-driven imagery, and (c) whether they relate to participants’ experienced imagery intensity and to personality-like dispositions.
Six volunteers familiar with expressive vocal activity underwent functional magnetic resonance imaging (fMRI) while imagining producing short, non-verbal vocalizations conveying four different emotions; rest periods were included as a baseline. Each imagery block was followed by a rating task in which participants assessed the perceived strength of their vocal motor simulation. Participants also completed a questionnaire assessing social and affective dispositions thought to influence immersion in imagery and emotionally driven vocal simulation. fMRI data were analyzed using univariate voxelwise general linear model (GLM) approaches and multivariate pattern analysis (MVPA), a decoding-based method that characterizes how information is represented in the brain through distributed voxel activation patterns at the individual-subject level. Neural activation patterns were then compared with participants’ ratings and questionnaire scores indexing personality-like traits.
Results showed that the neural activity patterns underlying vocal motor imagery were modulated by emotion-driven imagery. MVPA also successfully decoded the intended emotion embedded in the vocal motor imagery from distributed voxel activity, although classification accuracy varied substantially across individuals. Classification algorithms performed better at distinguishing between imagined voices expressing positive and negative emotions—and even between two negative emotions—than between the two positive emotions. No systematic relationship emerged between decoding accuracy and the experienced intensity of vocal motor simulation. Notably, the participant with the highest decoding accuracy also exhibited the highest perspective-taking score, although this observation remains preliminary given the small sample size.
These findings highlight MVPA as a promising method for investigating the neural representations of vocal motor imagery, revealing how individuals encode information through mental processes traditionally considered private and subjective. The absence of a systematic relationship between decoding accuracy and subjective imagery experience underscores the need to refine self-report measures of imagery. The hypothesized association between personality-like dispositions and neural activation patterns requires further investigation. Potential applications of this work span voice pedagogy, clinical rehabilitation, and brain–computer interface

Gláucia Laís
Salomão