Phonetics and Phonology in Europe 2015

Invited speakers

Temporal domains of prosody and gesture: from syllables and beats to talk spurts and gesture units
David House (Department of Speech, Music and Hearing, KTH, Stockholm)

There is currently an increasing interest in the interaction between speech and gesture, and in particular the temporal relationship between prosody and gesture [1]. Kendon [2] followed by McNeill [3] have divided gestures into gesture units, gesture phrases and gesture phases. The gesture unit, the longest temporal domain, is the interval of gestural movement bounded by a period of non-movement. A gesture unit is comprised of one or more gesture phrases each of which can be divided up into a sequence of gesture phases. The stroke phase of a gesture phrase is particularly interesting in terms of prosody. Some of these strokes (also called “beat” gestures) often coincide and appear to be synchronized with prosodic and intonational peaks related to prominence such as pitch accents [4][5]. Synchronization between the phrase level of intonation and gesture phrases has also been studied but has been found to be looser than the synchrony between the stroke phase and pitch accents [4] [6].

I will report on the results of two preliminary studies using automatic methods and motion capture techniques to extract head nods and hand motion from spontaneous Swedish dialogues. The first study investigated head nods with the syllable as the temporal domain [7]. The head nods were quantified and their alignment with syllable onset and syllable nucleus was measured. The syllables co-occurring with head nods showed greater intensity, higher F0 and greater F0 range when compared to the mean across the entire dialogue. The majority of these syllables belonged to words bearing a focal accent. The peak rotation of the nod was generally aligned with the stressed syllable, but there was considerable variation in fine temporal synchronization. The beat gesture on average occurred slightly ahead of the syllable which is consistent with the literature on temporal synchronization of co-speech gestures.

The second study investigated hand motion with the talk spurt as the temporal domain (defined as continuous speech between periods of silence). The extracted hand motion was divided into discrete segments corresponding to gesture units defined as the time period from one resting position to the next. The alignment between the onsets of these gesture units and automatically extracted talk spurt sequences was measured. Gestures co-occurred with up to two-thirds of the talk spurts depending on the speaker and the dialogue. Although there was considerable variation in onset synchrony between the talk spurts and the gesture units, there was a general tendency for the talk spurts to slightly precede the gesture unit, thus showing a timing trend contrary to that appearing between head motion and the syllable. Beat gestures can thereby be seen to share the time domain of the syllable while gesture units share the time domain of the talk spurt. Moreover, this could indicate that on a global temporal domain, speech precedes gesture, while on the local domain of the syllable, gesture precedes speech. I will discuss the implications of these findings in terms of motor activation and temporal domains of prosody and gesture.

[1] P. Wagner, Z. Malisz and S. Kopp. “Gesture and speech in interaction: An overview,” Speech Communication 57, pp. 209-232, 2014.
[2] A. Kendon, “Gesticulation and speech: Two aspects of the process of utterance,” In M. R. Key (Ed.), “The relationship of verbal and nonverbal communication,” pp. 207-227 The Hague: Mouton, 1980.
[3] D. McNeill, Hand and mind: What gestures reveal about thought, Chicago: The University of Chicago Press. 1992.
[4] D. Loehr, “Temporal, structural, and pragmatic synchrony between intonation and gesture,” Laboratory Phonology. Journal of the Association for Laboratory Phonology 3, pp. 71-889, 2012.
[5] T. Leonard and F. Cummins, “The temporal relation between beat gestures and speech,” Language and Cognitive Processes 26, pp. 1457−1471, 2011.
[6] M. Karpinskyi, E. Jarmolowicz-Nowikow and Z. Malisz. “Aspects of gestural and prosodic structure of multimodal utterances in Polish task-oriented dialogues,” Speech and language technology 11, pp. 113-122, 2009.
[7] S. Alexanderson, D. House, and J. Beskow, “Aspects of co-occurring syllables and head nods in spontaneous dialogue,” In Proc. of 12th International Conference on Auditory-Visual Speech Processing (AVSP2013). Annecy, France, 2013.

Predictive use of rhythm in speech perception and interaction
Sarah Hawkins (Centre for Music and Science, Faculty of Music, University of Cambridge)

It is increasingly accepted that, to jointly achieve coordinated actions, individuals entrain with each other’s rhythms e.g. in speech and movement, and neural oscillations in their brains align in period and phase [1, 4, 8, 15]. Local phase adjustments enhance periodicity during increased attention [9, 12] and brain activity synchronizes during social interaction [6]. Such general claims, though compatible with entrainment across conversational turns [5, 18], seem at odds with phonetic and phonological observations that simple measures of rhythmicity in speech have limited generality [2, 3, 13, 14]. I will describe three behavioural studies that asses the role of rhythm and metre as a fundamental property of successful communication, first to predict lexical/sentential meaning, then to align conversational turns, and lastly to achieve joint action involving spontaneous speaking and improvisational music-making.

An eye-tracking experiment assessed the perceptual salience of differences in the internal acoustic structure of English weak syllables that are either prefixes, dis-, mis-, re-, or word-initial but not prefixes. The dis- and mis- stimuli contrasted in morphological status but shared the same initial four phonemes and sometimes more e.g. discolour (prefix) vs discover (non-prefix), both /dɪsk/; mistypes vs mistakes, both /mɪst/. The re- stimuli contrasted in vowel phoneme and morphological status e.g. re-peel, /ri:/, vs repeal, /rɪ, rə/. The auditory distinction is rhythmic: both syllable types are weak, but prefixes take a heavier beat [16]. The findings suggest that the rhythmic distinction alone can drive prediction of lexical identity.

The second study analyses 56 Question-Answer pairs from five dyads of adult English speakers, conversing casually. Analyses using Loehr’s [10] pikes, here mainly temporal location of f0 peaks on accented syllables (* and % in ToBI ), suggest that most questions become rhythmic towards their end (pikes on the last 2-3 accented syllables are quasi-periodic) and that when the answer is straightforward, it typically starts with the same pulse as that set up by the end of the question. What carries that pulse is usually an f0 peak on the answer’s first accented syllable and/or a gesture such as a nod, but it can be a click or an in-breath. This suggests ‘embodied’ rather than solely ‘linguistic’ entrainment.

The third study tests the hypothesis that rhythmic entrainment is domain-general. The five dyads improvised on percussion instruments. Extracts were analysed in which they were talking during, or immediately before or after, fluently-maintained musical bouts [7]. Spoken pikes and musical pulses were more tightly aligned when interactants played more rhythmically together. This synchronization suggests that entrainment in one domain carries into the other.

Pikes and musical pulse reflect metre—an emergent organisation constructed by the brain in response to stimuli that are perceived as rhythmic [11]—with which the timing of accented syllables in English conforms quite nicely. In conversations, regular pulse seems likely to be only intermittent, localised to parts of the signal where coordinated action is needed, or where it is especially important for other reasons that attention is focussed [17]. I will suggest that when attentional focus cycles through a range of metrical domains, the occurrence of communicatively valuable information of many different types may be accurately predicted, and that entrainment to metrical pulse, even if intermittent, may reinforce a sense of well-being that encourages people to converse.

[1] Arnal, L. H. & Giraud, A.-L. (2012) Cortical oscillations and sensory predictions. TRENDS in Cognitive Sciences 16, 390-398. (10.1016/j.tics.2012.05.003)
[2] Arvaniti, A. (2009) Rhythm, timing and the timing of rhythm. Phonetica 66(suppl 1-2), 46-63.
[3] Arvaniti, A. (2012) The usefulness of metrics in the quantification of speech rhythm. Journal of Phonetics 40, 351-373. (doi:10.1016/j.wocn.2012.02.003)
[4] Fujioka, T., Trainor, L. J., Large, E. W. & Ross, B. (2012) Internalized timing of isochronous sounds is represented in neuromagnetic Beta oscillations. Journal of Neuroscience 32, 1791-1802.
[5] Garrod, S. & Pickering, M. J. (2009) Joint action, interactive alignment, and dialog. Topics in Cognitive Science 1, 292-304. (10.1111/j.1756-8765.2009.01020.x)
[6] Hasson, U., Ghazanfar, A. A., Galantucci, B., Garrod, S. & Keysers, C. (2012) Brain-to-brain coupling: a mechanism for creating and sharing a social world. TRENDS in Cognitive Sciences 16, 114-121. (10.1016/j.tics.2011.12.007)
[7] Hawkins, S., Cross, I. & Ogden, R. (2013) Communicative interaction in spontaneous music and speech. In Language, Music and Interaction. Orwin M, Howes C, Kempson R (Eds.) pp. 285-329. London: College Publications.
[8] Lakatos, P., Shah, A. S., Knuth, K., H, Ulbert, I., Karmos, G. & Schroeder, C. E. (2005) An oscillatory hierarchy controlling neuronal excitability and stimulus processing in the auditory cortex. Journal of Neurophysiology 94, 1904-1911. (10.1152/jn.00263.2005)
[9] Large, E. W. & Jones, M. R. (1999) The dynamics of attending: How people track time-varying events. Psychological Review 106, 119-159.
[10] Loehr, D. (2012) Temporal, structural, and pragmatic synchrony between intonation and gesture. Laboratory Phonology 3, 71-89. (10.1075/gest.7.2.04loe)
[11] London, J. (2012) Hearing in Time. 2nd ed. Oxford: Oxford University Press.
[12] Müller, V., Sänger, J. & Lindenberger, U. (2013) Intra- and inter-brain synchronization during musical improvisation on the guitar. PLoS ONE 8, e73852. (10.1371/journal.pone.0073852)
[13] Nolan, F. J. & Asu, E. L. (2009) The pairwise variability index and coexisting rhythms in language. Phonetica 66, 64-77.
[14] Nolan, F. J. & Jeon, H.-S. (2014) Speech rhythm: A metaphor? In Communicative Rhythms in Brain and Behaviour. Smith R, Rathcke T, Cummins F, Overy K, Scott S (Eds.) pp. 20130396. London: Philosophical Transactions of the Royal Society B.
[15] Schroeder, C. E., Lakatos, P., Kajikawa, Y., Partan, S. & Puce, A. (2008) Neuronal oscillations and visual amplification of speech. TRENDS in Cognitive Sciences 12, 106-113. (10.1016/j.tics.2008.01.002)
[16] Smith, R., Baker, R. & Hawkins, S. (2012) Phonetic detail that distinguishes prefixed from pseudo-prefixed words. Journal of Phonetics 40, 689-705. (10.1016/j.wocn.2012.04.002)
[17] Vesper, C., van der Wel, R. P. R. D., Knoblich, G. & Sebanz, N. (2011) Making oneself predictable: Reduced temporal variability facilitates joint action coordination. Experimental Brain Research 211, 517-530.
[18] Wilson, M. & Wilson, T. P. (2005) An oscillator model of the timing of turn-taking. Psychonomic Bulletin and Review 12, 957-968.

Unraveling the Consonant System of Puinave, a Native Language of Colombia
Leo Wetzels (Vrije Universiteit Amsterdam)

The Wãnsöjöt, more generally known as Puinave, represent a relatively well-populated indigenous group, with about 3500 speakers, who are located in two areas. The larger group lives in Colombia, in the region of the Inírida River. The other, smaller, group lives on the shores of the Venezuelan Orinoco. Giron (2008) proposes the following system of underlying consonantal phonemes for this language

ptkʔ, h

Wãnsöhöt, possesses a set of partially nasal allophones which, in Giron’s analysis, are the word-initial realizations of underlying nasal consonants: [##mbV], [##ndV]. Languages possessing this contour type usually do not contrast nasal consonants with (non-sonorant) voiced stops */p, b, m/. Instead, they oppose a series of voiceless stops, the /P/ class, with a series of phonemes that is represented by a set of allophones (or a subset thereof), which, for the labial place of articulation, is [m͜b, b͜m, b͜m͜b, m, b], the /{M,B}/ class. It was argued in Wetzels (2008), that contour stops of the kind under discussion may have different lexical sources, either /m/ or /b/, depending on the characteristics of the phonological grammar of the language. As a general rule, when contour stops occur only word- or syllable initially, they derive from underlying non-sonorant voiced stops. Consequently, from the perspective of this hypothesis, the underlying system of consonantal phonemes of Wãnsöhöt should be as in (2) below, rather than the one proposed by Giron (2008).

ptkʔ, h

In this presentation I will focus on the issue of partially nasalized consonants, addressing the questions: a. What are the properties of the phonological grammars in which these sounds occur? b. What is the phonetic motivation for their emergence? c. What are the diagnostics capable of revealing their underlying source?

Having answered these questions, I will move on to the analysis of the consonant system of Wãnsöhöt. I will evaluate the arguments presented by Giron for positing a series of nasal phonemes for this language and the ensuing consequences in terms of the generalisations that must be posited to account for the phonetic surface structure. I will then argue in favour of an analysis in which a series of non-sonorant voiced stops substitutes the nasal consonants and show that it leads to a more natural and less complex analysis, which, moreover, brings Wãnsöhöt in line with the cross-linguistic typology observed for the distribution of contour stops.

[*] Jesús Mario Girón. 2008. Una Gramática del Wãnsöhöt. PhD. Vrije Universiteit Amsterdam. Published in the LOT Dissertation series, nr 185.
[*] W. Leo Wetzels. 2008. ‘Thoughts on the Phonological Definition of Nasal/Oral Contour Consonants in Some Indigenous Languages of South-America’. Revista ALFA 52(2): Abordagens em Fonética e Fonologia: estudos auditivos, acústicos e perceptivos; modelos de análise fonológica de ontem e de hoje. 251-278. São Paulo.