Speech is riddled with variation, a fact unnoticed by most speakers and listeners. Every time a word is produced, it is uttered by different people in different contexts, with varying duration and amplitude, and so on. With so much variation, the speech signal carries a lot of information. Speech carries linguistic content—sounds, words, and their structure, but it also carries abundant social information especially about the talker. Talkers’ voice and ways of speaking often reveal their gender, age, ethnicity, socio-economic status, geographical origin, personalities, and emotional state, to name just a few. The focus of work on phonetic variation in spoken language processing has been mostly on the mapping of the variable signal to sounds and words, with much less focus on the role of phonetically cued social/talker variation.
This dissertation investigates the effect of phonetically cued emotional information (i.e. emotional prosody) on spoken word recognition. Even words whose meanings are not emotionally laden (e.g., pineapple) can be uttered in a way that conveys anger, happiness, or sadness through phonetic modulation, and the current work investigates how this phonetic variation in the speech signal affects the way the spoken word is perceived, recognized, and ultimately understood. In order to investigate systematically the effects of emotional prosody on the spoken word recognition process, this dissertation asks three specific questions: (1) Does a non-emotional word produced with emotional prosody (e.g., pineapple in angry prosody) facilitate the recognition of an emotionally congruent word (e.g., mad, upset)? (2) Does a non-emotional word produced with emotional prosody (e.g., pineapple in angry prosody) facilitate or hinder the recognition of a word that is semantically related to the word that carries the prosody (e.g., fruit)? (3) Does emotional prosody change its effect on lexical processing depending on prosodic contexts?
In a series of experiments, I found that (1) Pineapple uttered in angry prosody activates strong associates to the emotion such as mad, upset. (2) Pineapple in angry prosody primes the semantically related word fruit equally well or even better than pineapple in neutral prosody, while pineapple in happy prosody performs worse than pineapple in neutral prosody. This finding highlights that atypical infrequent forms do not necessarily impede the recognition process. (3) Words uttered in emotional prosody showed a consistent semantic priming patterns — whether they appear in a within-prosody context, a remotely mixed condition or an immediately mixed condition, whereas, words uttered in neutral prosody exhibited the most sensitivity to varying prosodic contexts. These results present challenges to existing theories and call for additional mechanisms to fully account for complex listener behavior. I argue for three such mechanisms based on a recent proposal made in Sumner, Kim, King & McGowan (2014) — socioacoustic encoding, interactivity, and social- weighting. Simply put, this view suggests that the speech signal is simultaneously mapped to social representations (via socioacoustic encoding) as well as to linguistic representations. The socioacoustic encoding activates social features and categories (e.g., information about the speaker’s age, gender, and emotional state) early in lexical processing. This social information influences the spoken word recognition process by interacting with lexical information (interactivity) and by modulating attention allocation to the speech signal (social-weighting). By providing the crucial influence of emotional prosody on the word recognition, this work significantly broadens and expands our current knowledge of the spoken word recognition process.
(The format for this open part of the oral exam is a 30-45 minute talk by the Ph.D. candidate followed by questions from those attending, for a total of no more than 75 minutes. Please arrive promptly!)
University oral exam chair: Ray McDermott (Education)