Basic Science Presentations
Email the First Author with your Questions & Comments
If bandwidth is exceeded, go directly to our YouTube Playlist
Acoustic and Psychoacoustic Correlates of Perceived Vocal Strain
Supraja Anand, Ph.D.
Madison Dyjak, B.A.
Rahul Shrivastav, Ph.D., CCC-SLP
Abstract
Objective: Few systematic studies have investigated the relationship between acoustic and perceptual correlates of vocal strain and possible acoustic ...and perceptual correlates. Such correlates may be limited by the co-occurrence of multiple voice qualities. The goal of this study is to determine the potential influence of co-occurring breathiness on strain perception.
Methods: A large set of dysphonic voices were categorized by experts into voices perceived to be primarily strained and voices perceived to have strain along with breathiness. A subset of 18 voices per category were chosen to represent a wide continuum of strain severity based on a visual sort ranking (VSR) procedure. Ten listeners rated 540 /a/ vowels (18 voices x 10 repetitions x 3 trials) per category on a magnitude estimation (ME) task. Predictors of vocal strain included six spectral moment measures computed on a linear scale or from the output of an auditory processing front-end (Bark, excitation, and specific loudness). Sharpness was computed as the first moment of the specific loudness function expressed as a function of critical-band rate. Cepstral peak prominence, pitch height, and pitch strength also served as predictors.
Results: There was a high correlation between the VSR rankings of two experts for both categories of strain voices. Data also indicated high intra- and inter-listener reliability on perceived strain magnitudes. Significant differences were observed in sharpness, some of the transformed spectral energy metrics, CPP, and pitch strength between the two categories of strain. Additional regression analyses will be completed to evaluate the relationship between acoustic predictors and the perceived strain magnitudes from the ME tasks.
Conclusions: Results of this study will aid in the development of computational models that account for covariance in voice qualities and will help to identify a suitable comparison stimulus for matching tasks to evaluate perception of strain.Show More

Now Playing
Acoustic and Psychoacoustic Correlates of Perceived Vocal Strain
David A. Eddins, Ph.D., CCC-A Supraja Anand, Ph.D. Madison Dyjak, B.A. ...
David A. Eddins, Ph.D., CCC-A
Supraja Anand, Ph.D.
Madison Dyjak, B.A.
Rahul Shrivastav, Ph.D., CCC-SLP
Abstract
Objective: Few systematic studies have investigated the relationship between acoustic and perceptual correlates of vocal strain and possible acoustic ...and perceptual correlates. Such correlates may be limited by the co-occurrence of multiple voice qualities. The goal of this study is to determine the potential influence of co-occurring breathiness on strain perception.
Methods: A large set of dysphonic voices were categorized by experts into voices perceived to be primarily strained and voices perceived to have strain along with breathiness. A subset of 18 voices per category were chosen to represent a wide continuum of strain severity based on a visual sort ranking (VSR) procedure. Ten listeners rated 540 /a/ vowels (18 voices x 10 repetitions x 3 trials) per category on a magnitude estimation (ME) task. Predictors of vocal strain included six spectral moment measures computed on a linear scale or from the output of an auditory processing front-end (Bark, excitation, and specific loudness). Sharpness was computed as the first moment of the specific loudness function expressed as a function of critical-band rate. Cepstral peak prominence, pitch height, and pitch strength also served as predictors.
Results: There was a high correlation between the VSR rankings of two experts for both categories of strain voices. Data also indicated high intra- and inter-listener reliability on perceived strain magnitudes. Significant differences were observed in sharpness, some of the transformed spectral energy metrics, CPP, and pitch strength between the two categories of strain. Additional regression analyses will be completed to evaluate the relationship between acoustic predictors and the perceived strain magnitudes from the ME tasks.
Conclusions: Results of this study will aid in the development of computational models that account for covariance in voice qualities and will help to identify a suitable comparison stimulus for matching tasks to evaluate perception of strain.Show More
Supraja Anand, Ph.D.
Madison Dyjak, B.A.
Rahul Shrivastav, Ph.D., CCC-SLP
Abstract
Objective: Few systematic studies have investigated the relationship between acoustic and perceptual correlates of vocal strain and possible acoustic ...and perceptual correlates. Such correlates may be limited by the co-occurrence of multiple voice qualities. The goal of this study is to determine the potential influence of co-occurring breathiness on strain perception.
Methods: A large set of dysphonic voices were categorized by experts into voices perceived to be primarily strained and voices perceived to have strain along with breathiness. A subset of 18 voices per category were chosen to represent a wide continuum of strain severity based on a visual sort ranking (VSR) procedure. Ten listeners rated 540 /a/ vowels (18 voices x 10 repetitions x 3 trials) per category on a magnitude estimation (ME) task. Predictors of vocal strain included six spectral moment measures computed on a linear scale or from the output of an auditory processing front-end (Bark, excitation, and specific loudness). Sharpness was computed as the first moment of the specific loudness function expressed as a function of critical-band rate. Cepstral peak prominence, pitch height, and pitch strength also served as predictors.
Results: There was a high correlation between the VSR rankings of two experts for both categories of strain voices. Data also indicated high intra- and inter-listener reliability on perceived strain magnitudes. Significant differences were observed in sharpness, some of the transformed spectral energy metrics, CPP, and pitch strength between the two categories of strain. Additional regression analyses will be completed to evaluate the relationship between acoustic predictors and the perceived strain magnitudes from the ME tasks.
Conclusions: Results of this study will aid in the development of computational models that account for covariance in voice qualities and will help to identify a suitable comparison stimulus for matching tasks to evaluate perception of strain.Show More

Now Playing
Gender Perception of Speech:
Gender Perception of Speech: Dependence on Formant Space ...
Gender Perception of Speech: Dependence on Formant Space Configuration, Fundamental Frequency, and Source Spectral Tilt
TJ Neuhaus, MS, PhD Student, Ronald C. Scherer, PhD
Abstract:
Objective: To explore how listeners use three ...aspects of the acoustic signal to determine speaker gender.
Methods: The software Madde, Praat, and Audacity were used to synthesize 210 sound files. The 210 files are the combinations of seven “formant space configurations” (FSC), 10 values for fundamental frequency, and three values for source spectral tilt. Each formant space configuration is the set containing the vowels /i, æ, ɑ, u/ and is based on average values for formant frequencies published in the literature. The lowest formant space configuration (FSC 1 in the figure below) is based on values for formant frequencies for the four vowels that are male-typical, the highest formant space configuration (FSC 7) is based on values for formant frequencies for the four vowels that are female-typical, and the remaining five formant space configurations of the four vowels are spaced in between using semitones. For fundamental frequency, the lowest value is male-typical, the highest value is female-typical, and the remaining eight values are evenly-spaced in between using semitones. The three values for source spectral tilt are -18 dB/oct, -14 dB/oct, and -10 dB/oct, which are approximate values for the voice qualities of breathy, normal, and pressed. The listeners are asked to rate the “speaker” of each synthesized vowel set as either male or female. The experiment has been performed on two individuals to guarantee methodology and provide preliminary results. Approximately 10 males and 10 females will be recruited this semester and the project will be completed by March 2020.
Results: Three main results are evident from the pilot study conducted with one male and one female subject using half (105) of the sound files covering the full ranges of formant space configurations, fundamental frequency, and source spectral tilt. First, increases in either formant space configuration (Figure 1) or fundamental frequency (Figure 1) were positively related to increases in the response of “female” (in the figure, zero represents all male choices). Second, increasing both formant space configuration and fundamental frequency together (Figure 1) was positively related to a higher response of “female.” Third, increases in the steepness of the source spectral tilt (Figure 2) were positively related to a higher response of “female” only at the gender-ambiguous fundamental frequency of 166.78 Hz.
Conclusions: Listeners use both fundamental frequency and formant frequencies to infer speaker gender, and investigation of the salience of source spectral tilt as a cue to speaker gender will continue. The results of this study will increase our understanding of how listeners use aspects of the acoustic speech signal to infer speaker gender. This understanding may guide transgender clients in modifying their communication to better reflect their gender, as well as address other perceptual concepts.
TJ Neuhaus, B.S.; Graduate Student; Department of Communication Sciences and Disorders, Bowling Green State University; 200 Health and Human Services Building, Bowling Green, OH 43403; 1-(616)-730-3280; tjneuha@bgsu.edu
Ronald C. Scherer, Ph.D.; Distinguished Research Professor; Department of Communication Sciences and Disorders, Bowling Green State University; 200 Health and Human Services Building, Bowling Green, OH 43403; 1-(419)-372-7189; ronalds@bgsu.eduShow More
TJ Neuhaus, MS, PhD Student, Ronald C. Scherer, PhD
Abstract:
Objective: To explore how listeners use three ...aspects of the acoustic signal to determine speaker gender.
Methods: The software Madde, Praat, and Audacity were used to synthesize 210 sound files. The 210 files are the combinations of seven “formant space configurations” (FSC), 10 values for fundamental frequency, and three values for source spectral tilt. Each formant space configuration is the set containing the vowels /i, æ, ɑ, u/ and is based on average values for formant frequencies published in the literature. The lowest formant space configuration (FSC 1 in the figure below) is based on values for formant frequencies for the four vowels that are male-typical, the highest formant space configuration (FSC 7) is based on values for formant frequencies for the four vowels that are female-typical, and the remaining five formant space configurations of the four vowels are spaced in between using semitones. For fundamental frequency, the lowest value is male-typical, the highest value is female-typical, and the remaining eight values are evenly-spaced in between using semitones. The three values for source spectral tilt are -18 dB/oct, -14 dB/oct, and -10 dB/oct, which are approximate values for the voice qualities of breathy, normal, and pressed. The listeners are asked to rate the “speaker” of each synthesized vowel set as either male or female. The experiment has been performed on two individuals to guarantee methodology and provide preliminary results. Approximately 10 males and 10 females will be recruited this semester and the project will be completed by March 2020.
Results: Three main results are evident from the pilot study conducted with one male and one female subject using half (105) of the sound files covering the full ranges of formant space configurations, fundamental frequency, and source spectral tilt. First, increases in either formant space configuration (Figure 1) or fundamental frequency (Figure 1) were positively related to increases in the response of “female” (in the figure, zero represents all male choices). Second, increasing both formant space configuration and fundamental frequency together (Figure 1) was positively related to a higher response of “female.” Third, increases in the steepness of the source spectral tilt (Figure 2) were positively related to a higher response of “female” only at the gender-ambiguous fundamental frequency of 166.78 Hz.
Conclusions: Listeners use both fundamental frequency and formant frequencies to infer speaker gender, and investigation of the salience of source spectral tilt as a cue to speaker gender will continue. The results of this study will increase our understanding of how listeners use aspects of the acoustic speech signal to infer speaker gender. This understanding may guide transgender clients in modifying their communication to better reflect their gender, as well as address other perceptual concepts.
TJ Neuhaus, B.S.; Graduate Student; Department of Communication Sciences and Disorders, Bowling Green State University; 200 Health and Human Services Building, Bowling Green, OH 43403; 1-(616)-730-3280; tjneuha@bgsu.edu
Ronald C. Scherer, Ph.D.; Distinguished Research Professor; Department of Communication Sciences and Disorders, Bowling Green State University; 200 Health and Human Services Building, Bowling Green, OH 43403; 1-(419)-372-7189; ronalds@bgsu.eduShow More

Now Playing
Glottal Attack and Offset Time During Connected Speech in Adductor Spasmodic Dysphonia
Authors: Corinne Brown, BA(1) Nicole Heinz(1) Stephanie RC Zacharias, ...
Authors:
Corinne Brown, BA(1)
Nicole Heinz(1)
Stephanie RC Zacharias, PhD, CCC-SLP(2)
Dimitar D Deliyski, PhD(1)
Maryam Naghibolhosseini, PhD(1)
1 Michigan State University, East Lansing, MI, USA
2 Mayo Clinic-Arizona, Phoenix, AZ, USA
Abstract:
Objective: Spasmodic ...dysphonia disrupts laryngeal muscle control during speech and therefore, affects the onset and offset of phonation. In this study, high-speed videoendoscopy (HSV) was used to measure the glottal attack and offset times during connected speech for vocally normal adult participants and adults with adductor spasmodic dysphonia (AdSD).
Methods: A monochrome HSV system was used to record readings of the “Rainbow Passage” from vocally normal adults and adults with AdSD. Acoustic recordings were collected simultaneously with the HSV and were transcribed to extract the phonemes and words from the HSV data during connected speech. Several raters visually analyzed the HSV data using a playback software (PFV4 Photron FASTCAM Viewer, Photron USA, Inc., San Diego, CA) with a playback speed of 30 frames per second to measure the durations of glottal attack and offset times. The glottal attack time was determined as the time between the first oscillation of the vocal folds and the first contact. The glottal offset time was defined as the time between the last contact of the vocal folds and the last oscillation.
Results: The timestamps for the first oscillation, first contact, last oscillation, and last contact of the vocal folds were determined by the raters. The glottal attack and offset times were successfully measured for different phonemes and words of the connected speech for both vocally normal subjects and patients with AdSD.
Conclusions: Glottal attack and offset times can be used to quantitatively assess the voice production mechanisms in AdSD. These measures can be used as biomarkers to study mechanisms of voice production in vocally normal people and patients with AdSD.
Acknowledgments: The authors would like to acknowledge the support from the NIH through NIDCD grant K01DC017751 and the Michigan State University Discretionary Funding Initiative.Show More
Corinne Brown, BA(1)
Nicole Heinz(1)
Stephanie RC Zacharias, PhD, CCC-SLP(2)
Dimitar D Deliyski, PhD(1)
Maryam Naghibolhosseini, PhD(1)
1 Michigan State University, East Lansing, MI, USA
2 Mayo Clinic-Arizona, Phoenix, AZ, USA
Abstract:
Objective: Spasmodic ...dysphonia disrupts laryngeal muscle control during speech and therefore, affects the onset and offset of phonation. In this study, high-speed videoendoscopy (HSV) was used to measure the glottal attack and offset times during connected speech for vocally normal adult participants and adults with adductor spasmodic dysphonia (AdSD).
Methods: A monochrome HSV system was used to record readings of the “Rainbow Passage” from vocally normal adults and adults with AdSD. Acoustic recordings were collected simultaneously with the HSV and were transcribed to extract the phonemes and words from the HSV data during connected speech. Several raters visually analyzed the HSV data using a playback software (PFV4 Photron FASTCAM Viewer, Photron USA, Inc., San Diego, CA) with a playback speed of 30 frames per second to measure the durations of glottal attack and offset times. The glottal attack time was determined as the time between the first oscillation of the vocal folds and the first contact. The glottal offset time was defined as the time between the last contact of the vocal folds and the last oscillation.
Results: The timestamps for the first oscillation, first contact, last oscillation, and last contact of the vocal folds were determined by the raters. The glottal attack and offset times were successfully measured for different phonemes and words of the connected speech for both vocally normal subjects and patients with AdSD.
Conclusions: Glottal attack and offset times can be used to quantitatively assess the voice production mechanisms in AdSD. These measures can be used as biomarkers to study mechanisms of voice production in vocally normal people and patients with AdSD.
Acknowledgments: The authors would like to acknowledge the support from the NIH through NIDCD grant K01DC017751 and the Michigan State University Discretionary Funding Initiative.Show More

Now Playing
High-speed Video Observations of Vocal Fold Adductory Kinematics while Playing a Clarinet
A Preliminary Investigation of of Vocal Fold Kinematics During ...
A Preliminary Investigation of of Vocal Fold Kinematics During Clarinet Playing Using Flexible High-Speed Video
Thomas Murry, PhD, CCC-SLP, Professor, Loma Linda University Health.
Wenli Chen, MS CCC-SLP, Speech-Language Pathologist, ...Private Practice
Stephanie Zacharias, PhD, Research Associate, Mayo Clinic Arizona
David Lott, MD Director Of Voice Center, Otorhinolaryngology (ENT)/Head and Neck Surgery , Mayo Clinic Arizona
Wind Instrumentalists have been known to have hearing disorders, temporomandibular joint disorders, lip disorders and orofacial changes over the course of years. In addition, several reports exist of voice problems in woodwind players who are also singers (Ocker, et al. Gallivan et al. 2006). Gallivan et al. (2006) observed that, while other supraglottic factors may contribute to voice disorders, vocal fatigue and respiratory disorders, laryngeal examination with stroboscopy showed no complete adduction of the vocal folds (VF). However, Mukai (1989) reported that the woodwind instrument is played with a narrowed glottis and partially adducted VF and thus may affect phonation.
Flexible High-speed videoendoscopy (fHSV) allows movements of the larynx and VF to be captured at ultrafast rate (Woo, 2017, Naghibolhosseini, 2018). The purposes of this preliminary study were to investigate flexible high-speed technology and VF kinematics in woodwind instrumentalists. Specifically, this study recorded and analyzed the VF motions during phonation and playing in experienced clarinet musicians.
Method
Two experienced adult clarinet musicians participated in flexible endoscopy while playing the clarinet and vocalizing. fHSV system captured images at a rate of 4000-frames per second. fHSV and audio signals were synchronized. Participants played B-4 and B-5 produced with legato, sudden, soft and loud onsets with the clarinet; they vocally produced the same tasks.
Results
Motion of the vocal folds was seen during all samples recorded. In all cases, the VF did not show full adduction. However, HSV demonstrated that positioning of the VF for clarinet production was different than for vocalization. Vibratory motion was observed for all samples with the loud and the sudden onset samples showing greater adductory motion than the legato samples.
Conclusion
The use of HSV demonstrates the non-adducted motion of the VF during clarinet playing. The incomplete adducted postures during playing differed from those of phonation.Show More
Thomas Murry, PhD, CCC-SLP, Professor, Loma Linda University Health.
Wenli Chen, MS CCC-SLP, Speech-Language Pathologist, ...Private Practice
Stephanie Zacharias, PhD, Research Associate, Mayo Clinic Arizona
David Lott, MD Director Of Voice Center, Otorhinolaryngology (ENT)/Head and Neck Surgery , Mayo Clinic Arizona
Wind Instrumentalists have been known to have hearing disorders, temporomandibular joint disorders, lip disorders and orofacial changes over the course of years. In addition, several reports exist of voice problems in woodwind players who are also singers (Ocker, et al. Gallivan et al. 2006). Gallivan et al. (2006) observed that, while other supraglottic factors may contribute to voice disorders, vocal fatigue and respiratory disorders, laryngeal examination with stroboscopy showed no complete adduction of the vocal folds (VF). However, Mukai (1989) reported that the woodwind instrument is played with a narrowed glottis and partially adducted VF and thus may affect phonation.
Flexible High-speed videoendoscopy (fHSV) allows movements of the larynx and VF to be captured at ultrafast rate (Woo, 2017, Naghibolhosseini, 2018). The purposes of this preliminary study were to investigate flexible high-speed technology and VF kinematics in woodwind instrumentalists. Specifically, this study recorded and analyzed the VF motions during phonation and playing in experienced clarinet musicians.
Method
Two experienced adult clarinet musicians participated in flexible endoscopy while playing the clarinet and vocalizing. fHSV system captured images at a rate of 4000-frames per second. fHSV and audio signals were synchronized. Participants played B-4 and B-5 produced with legato, sudden, soft and loud onsets with the clarinet; they vocally produced the same tasks.
Results
Motion of the vocal folds was seen during all samples recorded. In all cases, the VF did not show full adduction. However, HSV demonstrated that positioning of the VF for clarinet production was different than for vocalization. Vibratory motion was observed for all samples with the loud and the sudden onset samples showing greater adductory motion than the legato samples.
Conclusion
The use of HSV demonstrates the non-adducted motion of the VF during clarinet playing. The incomplete adducted postures during playing differed from those of phonation.Show More

Now Playing
Horizontal Calibration for a Laser-Projection Transnasal Fiberoptic HSV System
Authors: Hamzeh Ghasemzadeh, MSc, PhD Candidate in Department of ...
Authors:
Hamzeh Ghasemzadeh, MSc, PhD Candidate in Department of Communicative Sciences and Disorders, Michigan State University; PhD Candidate in Department of Computational Mathematics Science and Engineering, Michigan State University, email: ghasemza@msu.edu ...
Dimitar D. Deliyski, PhD, Professor and Chair, Department of Communicative Sciences and Disorders, Michigan State University.
Abstract:
Objective: An automated method for horizontal calibration of a laser-projection transnasal fiberoptic high-speed videoendoscopy (HSV) system was developed. The optical principle of the developed laser-calibrated endoscope is to project a grid of 7×7 green laser points across the field of view (FOV) at an angle relative to the imaging axis. Considering that each laser point had a slightly different angle from the imaging axis, the projected laser points were not parallel. This implies that, the vertical distance would be a confounding factor for horizontal calibration and subsequent horizontal measurements. The objectives of this research were to develop the methodology for horizontal calibrated measurement and to quantify the measurement error of the developed method.
Method: A custom-built transnasal fiberoptic endoscope was used to project a grid of 7×7 green laser points on the FOV and the scene was recorded using a monochrome high-speed camera. The optical design of the laser projection system lead to laser beams that were divergent. Therefore, the distance between any two laser points was a function of the working distance. To account for this factor, the horizontal calibration was developed based on two different sets of benchtop recordings. In the first set a calibrated grid paper was used as the target surface, while in the second set a white paper was used instead. The working distance was systematically changed from 5 mm to 35 mm in 1-mm increments for both sets. Recordings from the first set were used to determine the mm-distance between laser points at each working distance. The second set was used for learning the trajectories of each laser point as a function of the working distance. This step was carried out to account for the effect of the working distance on the horizontal distance between laser points. Finally, different line segments with known mm sizes were recorded at different working distances. These recordings were used for measuring the accuracy of the developed method.
Results: The results from the experiments demonstrated accurate horizontal measurement using the laser-calibrated endoscope.
Conclusion: The proposed method has resolved a challenge toward horizontal calibrated measurements from in-vivo HSV recordings. It is expected that in-vivo horizontal measurements would improve our knowledge from the vocal fold kinematics and help us make direct evaluation of treatment outcome, e.g. the effect of therapy on lesion size.Show More
Hamzeh Ghasemzadeh, MSc, PhD Candidate in Department of Communicative Sciences and Disorders, Michigan State University; PhD Candidate in Department of Computational Mathematics Science and Engineering, Michigan State University, email: ghasemza@msu.edu ...
Dimitar D. Deliyski, PhD, Professor and Chair, Department of Communicative Sciences and Disorders, Michigan State University.
Abstract:
Objective: An automated method for horizontal calibration of a laser-projection transnasal fiberoptic high-speed videoendoscopy (HSV) system was developed. The optical principle of the developed laser-calibrated endoscope is to project a grid of 7×7 green laser points across the field of view (FOV) at an angle relative to the imaging axis. Considering that each laser point had a slightly different angle from the imaging axis, the projected laser points were not parallel. This implies that, the vertical distance would be a confounding factor for horizontal calibration and subsequent horizontal measurements. The objectives of this research were to develop the methodology for horizontal calibrated measurement and to quantify the measurement error of the developed method.
Method: A custom-built transnasal fiberoptic endoscope was used to project a grid of 7×7 green laser points on the FOV and the scene was recorded using a monochrome high-speed camera. The optical design of the laser projection system lead to laser beams that were divergent. Therefore, the distance between any two laser points was a function of the working distance. To account for this factor, the horizontal calibration was developed based on two different sets of benchtop recordings. In the first set a calibrated grid paper was used as the target surface, while in the second set a white paper was used instead. The working distance was systematically changed from 5 mm to 35 mm in 1-mm increments for both sets. Recordings from the first set were used to determine the mm-distance between laser points at each working distance. The second set was used for learning the trajectories of each laser point as a function of the working distance. This step was carried out to account for the effect of the working distance on the horizontal distance between laser points. Finally, different line segments with known mm sizes were recorded at different working distances. These recordings were used for measuring the accuracy of the developed method.
Results: The results from the experiments demonstrated accurate horizontal measurement using the laser-calibrated endoscope.
Conclusion: The proposed method has resolved a challenge toward horizontal calibrated measurements from in-vivo HSV recordings. It is expected that in-vivo horizontal measurements would improve our knowledge from the vocal fold kinematics and help us make direct evaluation of treatment outcome, e.g. the effect of therapy on lesion size.Show More

Now Playing
Non-Linear Image Distortions in Flexible Fiberoptic Endoscopes
Non-linear image distortions in flexible fiberoptic endoscopes and ...
Non-linear image distortions in flexible fiberoptic endoscopes and their effects on calibrated horizontal measurements
Hamzeh Ghasemzadeh, MSc, PhD Candidate, Dimitar D. Deliyski, PhD
Authors:
Hamzeh Ghasemzadeh, MSc, PhD Candidate in Department of Communicative ...Sciences and Disorders, Michigan State University; PhD Candidate in Department of Computational Mathematics Science and Engineering, Michigan State University, email: ghasemza@msu.edu
Dimitar D. Deliyski, PhD, Professor and Chair, Department of Communicative Sciences and Disorders, Michigan State University.
Abstract:
Objective: Laryngeal high-speed videoendoscopy (HSV) and videostroboscopy systems coupled with flexible fiberoptic endoscopes typically have a wide angle of view, which may result in non-linear distortions of the recorded images, also known as the fisheye effect. A second factor leading to non-linear distortions of laryngeal images is the deviation in the imaging angle. The effect of these non-linear distortions on the horizontal measurements of a laser-calibrated endoscope are quantified and analyzed in this study.
Method: A custom-built transnasal fiberoptic endoscope was used to project a grid of 7×7 green laser points on the field of view (FOV) and the scene was recorded using a monochrome high-speed camera. The optical design of the laser projection system allows for calibrated horizontal and vertical measurements. Two different experiments were conducted with this laser-calibrated endoscope. In the first experiment, a perpendicular imaging angle was used to record a rectangular grid at multiple working distances. The analysis compared the length of the blocks in the center to those in the periphery of the FOV. In the second experiment, the imaging angle was varied between -20° to 20° in 5-degree steps and a rectangular grid at multiple working distances was recorded. The analysis compared the length of the blocks in the center, the left periphery, and the right periphery.
Results: Results of the first experiment showed a significant effect of the location of the FOV on the horizontal measurement. The second experiment showed a significant effect of imaging angle on the horizontal measurement.
Conclusion: The results of this work have underscored the effect of two important non-linear distortions on the laryngeal images. Considering the broad usage of subjective visual evaluations of the vocal folds (e.g. phase asymmetry, bowing, vocal fold edges) and possible applications of calibrated horizontal measurement in diagnosis and functional assessment of the vocal folds (e.g. lesion size, length of the vocal folds, kinematic measurements), these non-linear distortions could introduce significant errors, unless they are accurately compensated for.Show More
Hamzeh Ghasemzadeh, MSc, PhD Candidate, Dimitar D. Deliyski, PhD
Authors:
Hamzeh Ghasemzadeh, MSc, PhD Candidate in Department of Communicative ...Sciences and Disorders, Michigan State University; PhD Candidate in Department of Computational Mathematics Science and Engineering, Michigan State University, email: ghasemza@msu.edu
Dimitar D. Deliyski, PhD, Professor and Chair, Department of Communicative Sciences and Disorders, Michigan State University.
Abstract:
Objective: Laryngeal high-speed videoendoscopy (HSV) and videostroboscopy systems coupled with flexible fiberoptic endoscopes typically have a wide angle of view, which may result in non-linear distortions of the recorded images, also known as the fisheye effect. A second factor leading to non-linear distortions of laryngeal images is the deviation in the imaging angle. The effect of these non-linear distortions on the horizontal measurements of a laser-calibrated endoscope are quantified and analyzed in this study.
Method: A custom-built transnasal fiberoptic endoscope was used to project a grid of 7×7 green laser points on the field of view (FOV) and the scene was recorded using a monochrome high-speed camera. The optical design of the laser projection system allows for calibrated horizontal and vertical measurements. Two different experiments were conducted with this laser-calibrated endoscope. In the first experiment, a perpendicular imaging angle was used to record a rectangular grid at multiple working distances. The analysis compared the length of the blocks in the center to those in the periphery of the FOV. In the second experiment, the imaging angle was varied between -20° to 20° in 5-degree steps and a rectangular grid at multiple working distances was recorded. The analysis compared the length of the blocks in the center, the left periphery, and the right periphery.
Results: Results of the first experiment showed a significant effect of the location of the FOV on the horizontal measurement. The second experiment showed a significant effect of imaging angle on the horizontal measurement.
Conclusion: The results of this work have underscored the effect of two important non-linear distortions on the laryngeal images. Considering the broad usage of subjective visual evaluations of the vocal folds (e.g. phase asymmetry, bowing, vocal fold edges) and possible applications of calibrated horizontal measurement in diagnosis and functional assessment of the vocal folds (e.g. lesion size, length of the vocal folds, kinematic measurements), these non-linear distortions could introduce significant errors, unless they are accurately compensated for.Show More

Now Playing
Spatial Segmentation of Glottal Area in High-Speed Videoendoscopy during Connected Speech
Ahmed M. Yousef, M.S., Ph.D. Student, Department of Communicative ...
Ahmed M. Yousef, M.S., Ph.D. Student, Department of Communicative Sciences and Disorders, Michigan State University, Tel: 517-353-8641, email: yousefah@msu.edu
Dimitar D. Deliyski, Ph.D., Professor and Chair, Department of Communicative Sciences and ...Disorders, Michigan State University, Tel: 517-353-8780, email: ddd@msu.edu
Stephanie R.C. Zacharias, Ph.D., CCC-SLP, Research Scientist, Laryngotracheal Regeneration Lab, Mayo Clinic – Arizona, Tel: 480-301-4837, email: Zacharias.Stephanie@mayo.edu
Alessandro de Alarcon, MD, MPH, Professor, Division of Pediatric Otolaryngology, Cincinnati Children’s Hospital Medical Center, and Department of Otolaryngology – Head and Neck Surgery, University of Cincinnati College of Medicine, Tel: 513-636-4355, email: alessandro.dealarcon@cchmc.org
Robert F. Orlikoff, Ph.D., CCC-SLP, Professor and Dean, College of Allied Health Sciences, East Carolina University, Tel: 252-744-6010, email: orlikoffr16@ecu.edu
Maryam Naghibolhosseini, Ph.D., Assistant Professor, Department of Communicative Sciences and Disorders, Michigan State University, Tel: 517-884-2256, email: naghib@msu.edu
-----------
Abstract: This study proposes a new computational framework for spatial segmentation of the glottal area in high-speed videoendoscopy (HSV) data during connected speech. This is done to provide an accurate estimation of the glottal area waveform during vocal-fold vibrations in connected speech. HSV data were obtained from a vocally normal adult during production of the “Rainbow Passage.” An algorithm based on the active contour modeling approach was developed for the analysis of HSV data with high noise levels. The noise present in the HSV recordings was modeled in the developed computational framework. The new algorithm was applied on a series of HSV kymograms at different intersections of the vocal folds in order to detect the edges of the vocal folds. This edge detection method follows a set of deformation rules for energy optimization and eventually converges to the vocal fold edges during connected speech. The detected edges in the kymograms were then registered back to the HSV frames. The glottal area waveform was calculated based on the area of the glottis in each frame. The developed algorithm successfully described the edges of vocal folds in the HSV kymograms. This algorithm captured the glottal area across the HSV frames and lead to accurate measurement of the glottal area waveform. The proposed algorithm can serve as an accurate approach for spatial segmentation of the vocal folds in HSV data during connected speech. This study is one of the initial steps toward developing HSV-based measures to study the mechanisms of voice production and voice disorders in the context of connected speech.
Acknowledgments: The authors would like to acknowledge the support from the NIH through NIDCD grant K01DC017751, and the Michigan State University Discretionary Funding Initiative.Show More
Dimitar D. Deliyski, Ph.D., Professor and Chair, Department of Communicative Sciences and ...Disorders, Michigan State University, Tel: 517-353-8780, email: ddd@msu.edu
Stephanie R.C. Zacharias, Ph.D., CCC-SLP, Research Scientist, Laryngotracheal Regeneration Lab, Mayo Clinic – Arizona, Tel: 480-301-4837, email: Zacharias.Stephanie@mayo.edu
Alessandro de Alarcon, MD, MPH, Professor, Division of Pediatric Otolaryngology, Cincinnati Children’s Hospital Medical Center, and Department of Otolaryngology – Head and Neck Surgery, University of Cincinnati College of Medicine, Tel: 513-636-4355, email: alessandro.dealarcon@cchmc.org
Robert F. Orlikoff, Ph.D., CCC-SLP, Professor and Dean, College of Allied Health Sciences, East Carolina University, Tel: 252-744-6010, email: orlikoffr16@ecu.edu
Maryam Naghibolhosseini, Ph.D., Assistant Professor, Department of Communicative Sciences and Disorders, Michigan State University, Tel: 517-884-2256, email: naghib@msu.edu
-----------
Abstract: This study proposes a new computational framework for spatial segmentation of the glottal area in high-speed videoendoscopy (HSV) data during connected speech. This is done to provide an accurate estimation of the glottal area waveform during vocal-fold vibrations in connected speech. HSV data were obtained from a vocally normal adult during production of the “Rainbow Passage.” An algorithm based on the active contour modeling approach was developed for the analysis of HSV data with high noise levels. The noise present in the HSV recordings was modeled in the developed computational framework. The new algorithm was applied on a series of HSV kymograms at different intersections of the vocal folds in order to detect the edges of the vocal folds. This edge detection method follows a set of deformation rules for energy optimization and eventually converges to the vocal fold edges during connected speech. The detected edges in the kymograms were then registered back to the HSV frames. The glottal area waveform was calculated based on the area of the glottis in each frame. The developed algorithm successfully described the edges of vocal folds in the HSV kymograms. This algorithm captured the glottal area across the HSV frames and lead to accurate measurement of the glottal area waveform. The proposed algorithm can serve as an accurate approach for spatial segmentation of the vocal folds in HSV data during connected speech. This study is one of the initial steps toward developing HSV-based measures to study the mechanisms of voice production and voice disorders in the context of connected speech.
Acknowledgments: The authors would like to acknowledge the support from the NIH through NIDCD grant K01DC017751, and the Michigan State University Discretionary Funding Initiative.Show More

Now Playing
Vocal Fold Kinematics and Relative Fundamental Frequency as a Function of Obstruent Type
*Yeonggwang Park, BA, †Feng Wang, BS, *Manuel Díaz-Cádiz, MS, ...
*Yeonggwang Park, BA, †Feng Wang, BS, *Manuel Díaz-Cádiz, MS, *†Jennifer M. Vojtech, BS, *†Matti D. Groll, BS, and *†‡Cara E. Stepp, PhD
*Department of Speech, Language, and Hearing Sciences, Boston University
†Department ...of Biomedical Engineering, Boston University
*†‡Department of Otolaryngology – Head and Neck Surgery, Boston University School of Medicine
The acoustic measure, relative fundamental frequency (RFF), has been proposed as an objective metric for assessing vocal hyperfunction; however, its underlying physiological mechanisms have not yet been fully characterized. This study aimed to characterize the relationship between RFF and vocal fold kinematics. Simultaneous acoustic and high-speed videoendoscopic (HSV) recordings were collected as typical speakers repeated the utterances /ifi/ and /iti/. RFF values at voicing offsets and onsets surrounding the obstruents were estimated from acoustic recordings, whereas glottal angles, durations of voicing offset and onset, and a kinematic estimate of laryngeal stiffness (KS) were obtained from HSV images. RFF did not differ between the two obstruents at voicing offset; however, fricatives necessitated larger glottal angles and longer durations to devoice. RFF values were lower and glottal angle values were greater for stops relative to fricatives at voicing onset. KS values were greater in stops relative to fricatives. The less adducted, stiffer vocal folds and lower RFF at voicing onset for stops relative to fricatives in this study were in accordance with prior speculations that decreased vocal fold contact area and increased vocal fold stiffness may decrease RFF.Show More
*Department of Speech, Language, and Hearing Sciences, Boston University
†Department ...of Biomedical Engineering, Boston University
*†‡Department of Otolaryngology – Head and Neck Surgery, Boston University School of Medicine
The acoustic measure, relative fundamental frequency (RFF), has been proposed as an objective metric for assessing vocal hyperfunction; however, its underlying physiological mechanisms have not yet been fully characterized. This study aimed to characterize the relationship between RFF and vocal fold kinematics. Simultaneous acoustic and high-speed videoendoscopic (HSV) recordings were collected as typical speakers repeated the utterances /ifi/ and /iti/. RFF values at voicing offsets and onsets surrounding the obstruents were estimated from acoustic recordings, whereas glottal angles, durations of voicing offset and onset, and a kinematic estimate of laryngeal stiffness (KS) were obtained from HSV images. RFF did not differ between the two obstruents at voicing offset; however, fricatives necessitated larger glottal angles and longer durations to devoice. RFF values were lower and glottal angle values were greater for stops relative to fricatives at voicing onset. KS values were greater in stops relative to fricatives. The less adducted, stiffer vocal folds and lower RFF at voicing onset for stops relative to fricatives in this study were in accordance with prior speculations that decreased vocal fold contact area and increased vocal fold stiffness may decrease RFF.Show More

Now Playing
Voice Source and Articulation of Overtone Singing: A Case Study
Johan Sundberg, PhD, KTH, SMI, Stockholm University Anna Maria Hefele, ...
Johan Sundberg, PhD, KTH, SMI, Stockholm University
Anna Maria Hefele, Singer
Björn Lindblom, PhD, Stockholm University
ABSTRACT
Overtone singing is produced by a single artist in such a way that two simultaneously sounding pitches ...are perceived, a drone plus a strong overtone. Acoustically the spectrum typically contains a complete set of harmonic partials, one of which is 10 to 20 dB stronger than its neighbors.
According to Bloothoft and associates [1] the boosting of the overtone can be explained as the consequence of clustering the second and third formants. Klingholz [2] pointed out that the effect can be explained by a super-narrow bandwidth of a formant coinciding with the enhanced partial. It also has been suggested that non-linear source-filter interaction plays an important role.
The acoustic characteristics of vocal sounds are determined by the voice source, which is produced by the pulsating glottal airflow, and the sound transfer properties of the vocal tract. The aim of the present case study was to find out to what extent source-filter interaction or a super-narrow formant bandwidth are necessary to explain the production of overtone singing.
Examples of overtone singing were recorded, where a professional singer specializing in this technique (coauthor AMH) enhanced partials number 4 to 10. The recording was analyzed by inverse filtering, which does not model any source-filter interaction. The results showed that the radiated spectrum could be explained as a consequence of clustering the second and third formants close to the enhanced partial, as suggested by Bloothoft et al 1992 [1]; the source spectrum envelope did not show any enhancement of this partial. In other words, neither source-filter interaction, nor a super-narrow formant bandwidth were needed for explaining this overtone singing within the limits of normal voice production.
Inspection of an Internet MR movie documentation of the same artist [3] shows that she produced the overtone singing with an elevated tongue tip, the position of which is moved anteriorly with increasing frequency of the prominent overtone. In this way, a cavity is formed anterior to the tongue tip, complemented by a narrow lip opening, i.e., Helmholtz type resonator. Such a front cavity has been shown to typically determine the frequency of the third formant [4]. Attempts will be made to test the assumption that this overtone singer’s front cavity can be approximated as a resonator with a resonance frequency just above the frequency of the enhanced partial.
The results presently gained support the conclusion that overtone singing can be interpreted as a purely resonatory phenomenon produced by means of clustering the second and third formants with normal bandwidths just below and just above the enhanced partial.
Johan Sundberg, PhD, Dept of Speech Music Hearing, School of electrical engineering and Computer Science, KTH, SE-10044 Stockholm Sweden and University College of Music Education Stockholm, jsu@kth.se, +46707407873
Anna-Maria Hefele, Schärdingerstr. 28, A-4774 St. Marienkirchen, Austria, +43 660/2011509, ama@annamaria-hefele.com
Björn Lindblom, PhD, Dept of Linguistics, Stockholm University, SE- 106 91 Stockholm, Sweden, +705403669, lindblom@ling.su.se
References
[1] Bloothooft G1, Bringmann E, van Cappellen M, van Luipen JB, Thomassen KP. Acoustics and perception of overtone singing. J Acoust Soc Am. 1992 Oct; 92 (4 Pt 1):1827-36
[2] Klingholz F. Overtone Singing: Productive Mechanisms and Acoustic Data. Journal of Voice 7: 2, 118-122
[3] MR movie produced at the Freiburger Institut für Musikermedizin by prof. B Richter, prof M Echternach and Dr.-Ing M Burdumi, available at https://youtu.be/-jKl61Xxkh0
[4] Sundberg J, Lindblom B. Acoustic estimation of the front cavity in apical stops. J Acoust Soc Amer 88 1990, 1313-1317.Show More
Anna Maria Hefele, Singer
Björn Lindblom, PhD, Stockholm University
ABSTRACT
Overtone singing is produced by a single artist in such a way that two simultaneously sounding pitches ...are perceived, a drone plus a strong overtone. Acoustically the spectrum typically contains a complete set of harmonic partials, one of which is 10 to 20 dB stronger than its neighbors.
According to Bloothoft and associates [1] the boosting of the overtone can be explained as the consequence of clustering the second and third formants. Klingholz [2] pointed out that the effect can be explained by a super-narrow bandwidth of a formant coinciding with the enhanced partial. It also has been suggested that non-linear source-filter interaction plays an important role.
The acoustic characteristics of vocal sounds are determined by the voice source, which is produced by the pulsating glottal airflow, and the sound transfer properties of the vocal tract. The aim of the present case study was to find out to what extent source-filter interaction or a super-narrow formant bandwidth are necessary to explain the production of overtone singing.
Examples of overtone singing were recorded, where a professional singer specializing in this technique (coauthor AMH) enhanced partials number 4 to 10. The recording was analyzed by inverse filtering, which does not model any source-filter interaction. The results showed that the radiated spectrum could be explained as a consequence of clustering the second and third formants close to the enhanced partial, as suggested by Bloothoft et al 1992 [1]; the source spectrum envelope did not show any enhancement of this partial. In other words, neither source-filter interaction, nor a super-narrow formant bandwidth were needed for explaining this overtone singing within the limits of normal voice production.
Inspection of an Internet MR movie documentation of the same artist [3] shows that she produced the overtone singing with an elevated tongue tip, the position of which is moved anteriorly with increasing frequency of the prominent overtone. In this way, a cavity is formed anterior to the tongue tip, complemented by a narrow lip opening, i.e., Helmholtz type resonator. Such a front cavity has been shown to typically determine the frequency of the third formant [4]. Attempts will be made to test the assumption that this overtone singer’s front cavity can be approximated as a resonator with a resonance frequency just above the frequency of the enhanced partial.
The results presently gained support the conclusion that overtone singing can be interpreted as a purely resonatory phenomenon produced by means of clustering the second and third formants with normal bandwidths just below and just above the enhanced partial.
Johan Sundberg, PhD, Dept of Speech Music Hearing, School of electrical engineering and Computer Science, KTH, SE-10044 Stockholm Sweden and University College of Music Education Stockholm, jsu@kth.se, +46707407873
Anna-Maria Hefele, Schärdingerstr. 28, A-4774 St. Marienkirchen, Austria, +43 660/2011509, ama@annamaria-hefele.com
Björn Lindblom, PhD, Dept of Linguistics, Stockholm University, SE- 106 91 Stockholm, Sweden, +705403669, lindblom@ling.su.se
References
[1] Bloothooft G1, Bringmann E, van Cappellen M, van Luipen JB, Thomassen KP. Acoustics and perception of overtone singing. J Acoust Soc Am. 1992 Oct; 92 (4 Pt 1):1827-36
[2] Klingholz F. Overtone Singing: Productive Mechanisms and Acoustic Data. Journal of Voice 7: 2, 118-122
[3] MR movie produced at the Freiburger Institut für Musikermedizin by prof. B Richter, prof M Echternach and Dr.-Ing M Burdumi, available at https://youtu.be/-jKl61Xxkh0
[4] Sundberg J, Lindblom B. Acoustic estimation of the front cavity in apical stops. J Acoust Soc Amer 88 1990, 1313-1317.Show More