`pyLeTalker`: Python Framework for Wave Reflection Voice Synthesis


Objective: In silico voice/speech synthesis is of a great interest to many researchers over multiple decades and is immensely useful technology for the modern society. However, the pathological voice research field is still missing a publicly available computer voice synthesis tool which could fully satisfy its research needs. Those available (e.g., Praat or VocalTractLab) are often geared towards normal speech production and lack the flexibility in configuring the vocal fold vibration. For example, Titze's kinematic vocal fold model (1984) is perhaps the most suited model to simulate disordered conditions as the vocal fold vibration patterns are directly programmable, but none of the freely available tools fully unleashes its capability to author's knowledge. We have taken Story's Matlab LeTalker voice synthesizer to Python and developed a highly customizable package named "pyLeTalker." The synthesizer is built around the wave-reflection vocal tract model and provides a flexible platform to introduce pathological conditions and symptoms.

Design: The `pyLeTalker` package defines four types of synthesis elements: vocal tract (for both supralaryngeal vocal tract and trachea), vocal folds, lung, and lip. Each element has a two-port pressure network interface (forward pressure input/output ports and backward pressure input/output ports) and each is implemented as a Python class. For example, `LeTalkerVocalTract`, `LeTalkerVocalFolds`, `LeTalkerLung`, and `LeTalkerLip` classes are ported elements from the original LeTalker Matlab simulator. In addition, the package includes `KinematicVocalFolds` class for Titze's kinematic model. Every element can be configured to be dynamic to enable time-varying systems such as to simulate speech or speech-like contexts or bifurcations of vocal fold vibration patterns. Finally, the user may develop a new element and use it in a simulation with ease.

Results: We present four synthesis demonstrations: (1) Modal voices with mass-spring and kinematic vocal fold models, (2) Breathy voice with asymmetrical vocal folds, (3) Subharmonic voice with modulated sinusoidal vibration with bifurcations, and (4) Vowel-Consonant-Vowel pattern.

Conclusions: Our goal for `pyLeTalker` is to offer a framework to expedite research projects involving synthesized disordered voices. This package will be open-source distributed on GitHub for other researchers to use and contribute.

Takeshi
Melda
Andrew
Brad
Ikuma
Kunduk
McWhorter
Story