RESEARCH | SPA Lab

OVERVIEW

English, the modern lingua franca, is spoken by approximately 400 million native speakers, with major concentrations in North America, the British Isles, Australia and New Zealand. Owing largely to increased mobility and globalization, there is an additional world-wide surge in non-native English use. The diversity of native and non-native English accents created a new context for investigation of the current global development of English pronunciation patterns.

At the Speech Perception and Acoustics Lab, we focus on documenting and characterizing variation in American English. We use quantitative approaches to reveal relationships among the pronunciation variants utilized by regional dialects, ethnic groups, and non-native speakers in the United States, and how these pronunciation features are acquired and maintained over human lifespan. Our experimental-behavioral approach aims to gain insight into cognitive mechanisms underlying efficacy of oral communication, which relies on human ability to perceive and interpret both predictable and unexpected variation in American English speech. A great deal of our work has explored the nature of acoustic variation in vowels in several generations of speakers across different dialect regions in the United States, and temporal organization of regionally accented speech including segmental durations and speech tempo.

We have recently turned to neuroscience research to increase our understanding of the mechanisms and processes involved in efficient auditory processing of variation in speech. Verbal communication is an interactive process between speaker and listener and the current technological and methodological advancements in brain-based research allow us to study interaction of neural activity between two (or more) brains and not only neural processes within an individual. Our collaborative efforts with The Center for Cognitive and Behavioral Brain Imaging (CCBBI) at Ohio State has led us to utilizing functional near-infrared spectroscopy (fNIRS) techniques to uncover processes involved in social (human-to-human) interaction. This research program aims to define communicative success in the context of socially conditioned variation in speech, and to delineate the acceptable forms of variation from the less optimal, taking into account linguistic and health-related limitations of speaker and listener.

Our behavioral and brain-based research also provides the basis for a better understanding of auditory processing in speech-language disorders in childhood and in adult years. Recently, we found that phonological impairment in dyslexia is associated with deficits in utilizing acoustic information cueing regional accent and gender of the speaker, which critically depends on the ability to extract and interpret consistent spectral and temporal variations. We also learned that Williams syndrome, a genetic disorder, impairs the ability to reproduce local pronunciation features which are abundantly present in speech of typically developing children and typical adults in a shared speech community. These and related findings reinforce the need for more sensitive measures and assessment tools to evaluate auditory processing abilities in speech-language disorders and developmental delays, and one of the ultimate goals of our research is to enhance clinical assessment and diagnosis.

CURRENT PROJECTS

Sociocultural learning in late childhood

In a series of acoustic and perception studies, we have asked how children from different regional dialect backgrounds learn pronunciation patterns from those around them. We target 8-12-year olds who are growing up in either stable linguistic environments where local dialect features have remained unchanged (such as in selected speech communities in Ohio and Wisconsin), or in changing environments where the traditional local features are gradually fading in favor of more standardized forms (such as in Appalachian regions of North Carolina). The current acoustic study examines the variation in the amount of stop closure voicing in the intersonorant position. Earlier, we proposed that the nature of stop closure voicing represents a systematic feature differentiating northern and southern speech. Now we seek to determine both the developmental stability of voiced stops in late childhood and the influence of regional variation on the amount of voicing in stop closures. Sentence productions from 48 girls from Central Ohio, Western North Carolina, and Southeastern Wisconsin are analyzed in a set of temporal variables and compared with productions of matching adults. Based on the literature showing that the mastery of lexical stress contrastivity continues into adolescence, we hypothesize that systematic variation in stop closure voicing is commensurate with the development of stress control; this relationship is further mediated by regional variation documented in the speech of adults.

Brain-to-brain synchrony in assessing listening effort

Evidence from the neuroscience of verbal communication shows that when two people share information (one speaks and the other listens) their brain activities work in synchrony. This brain-to-brain synchrony is lost when the listener fails to understand the speaker. We test the hypothesis that the brain-to-brain synchrony predicts the level of effort involved in auditory processing. In particular, stronger neural brain-to-brain coupling indicates less effort and results in better understanding; conversely, the weaker the coupling, the greater the effort, and the worse the processing and comprehension. We propose that examining the time course and nature of neural activity affords a more sensitive assessment of listening effort than currently available on the basis of behavioral measures. Using functional near-infrared spectroscopy (fNIRS) and fNIRS-based hyperscanning approaches, we analyze patterns of neural activity separately in the speaker and in the listener, and assess statistically the correspondence in their brain activation (the degree of synchronized activation of cortical sites and temporal symmetry). We examine the effects of degrading the auditory source; and varying the dialect and foreign accent in speakers. We predict the strongest coupling and the shortest time delay when the accent of the listener matches that of the speaker, followed by regional dialect mismatch and foreign-accent mismatch, respectively.

The emergence of voice gender cues in children’s speech

As children grow older, physiological differences between boys and girls become noticeable along multiple acoustic dimensions in speech. In this project, we focus on fundamental frequency (f₀), which is the primary cue to male-female distinction in voice perception. Utilizing children’s recordings from a large corpus, we measure f0 in children ranging in age from 7 to 13 years who grew up in three different geographic regions in the United States. The speech materials include words produced in isolation, in read sentences, and in free conversations. These utterances are then used as stimuli in perceptual gender identification experiments. Our interests are in the effects of stimulus speech (controlling for f0 ranges in shorter and longer utterances) and social factors (controlling for regional background of both speakers and listeners) on voice gender perception in children’s productions. Based on the existing literature, we expect detectable male-female differences to begin to emerge in 8-year olds. However, the constancy of voice gender cues amid stimulus and dialect variability will likely reveal itself at an older age.

The contribution of high frequency energy to speech perception

There is a new research evidence for the existence of information available in the high-frequency region in the perception of speech and voice, which has a potential to enhance speaker and word recognition in noise. Identifying reliable cues in the high-frequency end of the speech spectrum has further implications for the development of cochlear implants, hearing aids, cell phones and other communication technologies that are just now beginning to utilize this high frequency range. Given that almost all cues to speech intelligibility are contained within the low-frequency region, high-pass filtering has not been used in speech perception research as often as low-pass filtering. In this project, we use high-pass filtering to determine whether information about talker dialect and gender is available in the high-frequency region even in the absence of intelligibility cues. Setting the upper frequency limit at 11 kHz, sentences edited out of spontaneous conversations of 20 talkers from Ohio and 20 from North Carolina are high-pass filtered with frequency cut-offs varying from 0.7-5.56 kHz and presented to listeners from Ohio. Preliminary results show that listeners are still sensitive to differences between the two dialects at the two highest cut-offs, 3.32 kHz and 5.56 kHz, and dialect identification is mediated by talker gender. Although speech intelligibility has been reduced above 3 kHz, the results suggest that residual dialect cues are still distributed in the high frequency region, and are preserved differently in male and female voices.

Dialect cues in low-pass filtered speech

There is mounting evidence that regional dialects can differ in the way they utilize prosody including rhythm, intonation, pitch range, speaking rate and pausing. We explore the contribution of prosodic cues to perceptual dialect identification by systematically removing the segmental and semantic content from speech using low-pass filtering. Low-pass filtered speech, sounding like coming through a thick wall, retains lower frequency acoustic energy including the tonal quality of the voice, which preserves prosodic aspects of speech. We examine (1) how a series of progressively higher filters influence listeners’ perception of speaker dialect, and (2) what is the optimal filter for removing the semantic content while retaining most of the dialect-related prosodic information.

Spectral and temporal resolution in dyslexia

Auditory research in dyslexia proposes that deficient auditory processing of speech underlies difficulties with reading and spelling. Focusing predominantly on phonological processing, this research has not yet explored the role of the social context in which phonological representations are formed. We assess auditory processing of regional dialect (spoken in Ohio and North Carolina) and voice gender cues using filtered speech. In the first study, we aim to determine whether dyslexia reduces the ability to process two kinds of dialect cues, segmental and suprasegmental, and to recognize voice gender. In addition to the original unprocessed speech, there are two focused filtered speech conditions (using low-pass filtering at 400 Hz and 8-channel noise-vocoding). In the second study, the dialect and gender cues are conveyed by amplitude (temporal) envelope in noise-vocoded speech but the spectral resolution is modified by systematically increasing the number of noise-bands from 4 to 12. In both studies, children and adults with dyslexia from Ohio and their age-matched controls respond to 360 unique sentences extracted from spontaneous conversations of 40 speakers. The results thus far has shown that both groups with dyslexia were significantly less sensitive to dialect and gender cues than the controls. This performance can be linked to their comparatively poorer temporal and spectral resolution at the auditory periphery.

Amplitude envelope onsets in speech rhythm

Recent research in auditory neuroscience has suggested that neuronal oscillations entrain to amplitude modulation in the speech signal at different temporal rates. Modulations at slower rates (the “syllable rates”) seem to be important for the perception of rhythm. In particular, the onsets of modulations in amplitude envelope and their rate of change (also called “rise times”) will vary from syllable to syllable, and it will be greatest in stressed syllables. In this exploratory project, we examine the variation in rise time of amplitude envelope is stressed syllables occurring in a focal position in a sentence. In this highly controlled environment (in terms of phonetic properties of the syllable, sentence content, and the position of the sentence focus), we seek to determine how the syllable onset rise time varies as a function of vowel category (for inherently short and long vowels) and speaker dialect. The same speech material was read by 60 speakers, men and women, matched for age, and representing three regional dialects. Given the documented systematic cross-dialectal variation in acoustic properties of vowels as well as in articulation rate, we predict that dialect will also influence syllable rise time. If supported, amplitude envelope onsets can be an important marker in the perception of rhythmic differences among regional and ethnic varieties of American English.

Temporal organization in African American English

African American English (AAE) is a distinct ethnicity-based variety spoken by about 39 million people in the United States. Given the general shortage of acoustic-phonetic evidence, there are conflicting views with respect to the current development of AAE. For example, AAE speakers have variously been reported to participate in regional sound change, to innovate sound change or to resist sound change. Especially little is known about the temporal organization of AAE, including segmental durations, speech rate and rhythm. The aim of this research is to characterize temporal variation in AAE across different geographic regions in the United States. These data will provide acoustic evidence for a possibly distinctive use of temporal variation in AAE, which may also supply a salient perceptual cue to ethnic identification.

Non-native processing of regional variation in American English

Native listeners typically outperform non-native listeners in experiments testing intelligibility of American English dialects. Presumably, second language (L2) speakers are less sensitive to fine-grained acoustic phonetic details that are present in dialect-inherent pronunciation patterns. This project seeks to characterize the selective listening of L2 speakers, assessing the salience of regional pronunciation features in L2 perception. Specifically, which acoustic-phonetic cues that are salient to native listeners are also salient to L2 listeners? Which cues are ignored by L2 listeners and why? L2 listeners from several native language (L1) backgrounds, children and adults, are presented with a range of regional accents and speaker voices to determine how they encode the auditory input and how they decode the linguistic message from variable sources. Ultimately, the project will help to define the nature of the L2 speech processing deficit and the conditions for developing sensitivity to regional variation in L2 listeners.