How do language dynamics -- in acquisition, processing or historical change -- relate to the structure of linguistic systems? How do social and cognitive factors interact in shaping human languages? I explore these questions using experiments, statistical analyses of large corpora, and computational modelling, including new neural network approaches to learning and prediction. Most of my recent work is focused on words. I look at how words are created, how they are learned and encoded in memory, and how the patterns in known words are generalised in creating and processing new words. This investigation brings in many aspects of language: morphology, semantics, phonology, social factors, and discourse structure.
In on-line forums, communities of users often converge in their language and opinions. This project is focussed on one particular aspect of this phenomenon, namely phrases that express a degree on a scale, such as a scale of happiness, approval, cost, or quality. These phrases are of special interest for analysing on-line language, because exaggerations along various scales are one way that groups of users can reinforce each others' views of the world, and distinguish their own group from other groups. Such mutual reinforcement can have negative repercussions when it involves divisive or false atttudes and beliefs, providing an important example of what are called echo-chamber effects. Echo-chamber effects have been implicated in the rise of violence and extremism, political gridlock, and decreases in social mobility,
Using archives of the popular on-line forum Reddit, the project will develop a large-scale and experimentally normed data set of scalar expressions. It will develop next-generation NLP algorithms for assessing and predicting the meanings of scalar expresssions. Advanced graph-based machine learning methods will be used to analyse the social network of Reddit users, and to integrate the language analysis with the social network analysis in order to develop predictors of social cohesion and fragmentation. The project is a collaboration with Xiaowen Dong of the Oxford Man Institute .
Sponsor: UK EPSRC: Responsible Natural Language Processing for Intelligent Interfaces
The shared vocabulary of a language community may be the ultimate public good, supporting cooperation and collective intelligence at a scale that is unparalleled in other species. The goal of the Wordovators project was to understand how complex shared vocabularies are created, negotiated and transmitted within communities. A collaboration of the Oxford e-Research Centre with Northwestern University and the New Zealand Institute of Language Brain and Behaviour, the project combined large-scale experiments in the form of computer games with mathematical and computational analysis.
Sponsor: John Templeton Foundation
The SWORDFISH project (Spoken WOrdsearch with Rapid Development and Frugal Invariant Subword Hierarchies) seeks to develop algorithms for detecting words and phrases in audio recordings of under-resourced languages. My group collaborated on this project with scientists and engineers at the International Computer Science Institute in Berkeley, the University of Washington, Ohio State University, and Columbia University. Our effort was focused on semi-supervised and unsupervised methods for learning the most productive morphological patterns in languages with rich morphology.
Sponsor: IARPA BABEL Program
Sponsor: James S McDonnell Foundation