About the project
We are interested in the way in which changes in the grammar of languages ('innovations') spread out from a small number of speakers to a larger section of the population ('diffusion'). We use Twitter to gather large quantities of localisable data from many places and across large areas; this allows us to investigate language variation and change in fine geographical detail.
Using Twitter, we are collecting multiple datasets ('corpora') of tweets in English and Welsh in Britain; in Norwegian, Swedish, Danish, Icelandic, and Faroese across the Nordic countries; and in Turkish in Turkey. The selection of these languages allows us to compare the effects of very different demographic and geographic scenarios on patterns of diffusion: Welsh as a minority language versus English or Turkish as majority languages; the low population density in Norway versus the high population densities in large parts of England.
We are in the process of identifying language changes currently diffusing in these populations and investigating their distribution in these corpora; so far, this has included the spread of a new second-person pronoun chdi (you) in Welsh, the deletion of the present-tense auxiliary form of 'be' in Welsh (replacing dan/dyn/ryn/yn ni’n gweld with ni’n gweld for 'we see'), and the alternation between different forms of the English dative construction ('give it to me', 'give it me', or 'give me it').
We plan to use the interactions between users in our corpora (retweets, @ mentions, mutual following) to construct a model of these users' social network; we will then be able to compare the effectiveness of this network model (as a predictor of the pathway of diffusion) to the purely geographical model. Our results will be demonstrated in action through web-apps that predict users' origins using their responses to questions about their language use, and they will be made available to the public via an online atlas-style website.