Beyond the lab: using big data to discover principles of cognition
July 9-12, 2017 | Madison, Wisconsin, USA
With more than 100 years of collective practice, experimental psychologists have become highly sophisticated in their application of well-controlled laboratory experiments to reveal principles of human cognition and behavior. This approach has yielded rigorous experimental designs with extensive controls and it should be valued and encouraged. But the very expertise with which psychologists wield their tools for achieving laboratory control may now be limiting our field to the ways in which we can discover principles of cognition by going beyond the lab.
This workshop will focus on two “beyond the lab” approaches that have seen explosive growth in the last five years (for example, in just 2015 about 7000 scholarly articles made use of the Amazon Mechanical Turk crowdsourcing service). The first focus is on extending traditional laboratory techniques beyond the lab. These include the use of crowdsourcing services to conduct experiments of the type that are impractical or impossible to conduct in the lab, “gamifying” traditional data-collection such that participants actively want to participate in our studies, and organizing contests as a way of efficiently exploring the solution space to projects that are beyond the capabilities of a single research team. The second focus is on using “naturally occurring datasets” wherein creative interrogation of a diverse range of large, real-world data sets can reveal principles of human judgment, perception, categorization, decision making, language use, inference, problem solving, and mental representation. Both of these approaches fit into the broader “big-data” initiatives that are transforming the social sciences.
Using Large-Scale Datasets to Advance Developmental Theory
Stanford University, USA
How can we learn about developmental psychology from large-scale observational datasets? I'll discuss a couple of projects that explore data on children's vocabulary development and general developmental milestones. These projects make use of R and the tidyverse to work with relatively large databases in a fairly painless way, so I'll discuss this workflow as well.
Studying Language Development Using Daylong Home Audio Recordings
University of California, Merced, USA
Daylong home recordings are transforming the study of early communication development. I will present some of my own research focused on how these data can be used to study exploration dynamics and social rewards in the development of speech sound production in infancy. I will also talk about some international efforts to develop an accessible database of these recordings and to connect behavioral scientists with automatic speech recognition experts to develop better tools for working with this type of data.
Mapping the Lexicon Using Large-Scale Empirical Semantic Networks
Simon De Deyne
University of Leuven, Belgium
An individual's mental lexicon contains the knowledge about words acquired over a lifetime. The Small World of Words project is a crowd sourced study that aims to map this lexicon in the major world languages. It uses free word associations to elicit information about our lexical knowledge with minimal constraints. The large scale of this approach means most of the words a person knows, and this is important aspect of this work as it enables us to study a more representative sample of different types of words and semantic relations that contribute to their meaning. One way to represent how the human lexicon is organized is by studying networks where in this case the network is obtained empirically. In the first part of this talk, I will discuss a series of studies showing how word meaning is organized in this network at different scales. At the global scale, the results indicate an organization that distinguishes positive and negative words in line with the seminal work by Charles E. Osgood and extends by identifying concreteness as one of the main determinants of structure in the mental lexicon. At an intermediate scale, the network indicates that most semantic categories are structured along thematic links rather than taxonomic links. In the second part of this talk I will elaborate further on how meaning is structured within different domains using a series of experiments where participants judge relatedness. I show relatedness can be derived by investigating the overlap between the response distributions of words and propose a dynamic approach to infer meaning using random walks. This allow us to estimate how similar remotely related words are even if they're not directly related. A recurring and important question is how external representations derived from language or other sources inform our mental representations. We might expect that our mental representations not only reflect the statistical of language, but are also informed by other types of non-linguistic information. The final part of this talk tries to provide some answers to this question by investigating to what degree language and mental representations encode modal information from perception and internal emotive states."
Data from Online Video Games: Solving Real-World Problems and Evaluating Design Decisions
Northeastern University, USA
Telemetry data gathered from online video games provides a wealth of useful information. In this talk, I will discuss work in video games for citizen science and crowdsourcing, in which gameplay data from players can be applied to solving real-world problems or making scientific discoveries. This data can also be used to evaluate, and potentially automate, design decisions, with the aim of improving the games as engaging problem-solving systems.
Megastudies: From Visual Word Recognition to Cable News Biases
Washington University in St Louis, USA
Megastudies have provided important leverage in the evaluation of computational models of visual word recognition. Moreover, these large databases have provided insights into novel predictor variables and afforded the development and application of nonstandard analytic techniques. Such databases have been developed via multi-university collaboration and more recently with smart devices, with similar results. Such megastudies use standard word recognition tasks (e.g., lexical decision and pronunciation) to develop the databases. A discussion of one such megastudy, the English Lexicon Project will be discussed. This approach will be briefly contrasted with a naturally occurring database that was used to evaluate the timely topic of media bias in cable news networks.
How Can Artificial Vision Models Teach Us About Human Scene Understanding?
Fordham University, USA
Putting the ACL in Computational Social Science
Cornell University, USA
This talk will focus on the effect of phrasing, emphasizing aspects that go beyond just the selection of one particular word over another. The issues we’ll consider include: Does the way in which something is worded in and of itself have an effect on whether it is remembered or attracts attention, beyond its content or context? Can we characterize how different sides in a debate frame their arguments, in a way that goes beyond specific lexical choice (e.g., “pro-choice” vs. “pro-life”)? The settings we’ll explore range from movie quotes that achieve cultural prominence; to posts that catch on or change minds on Facebook, Wikipedia, Twitter, arXiv, and the ChangeMyView subreddit; to language that affects discussion points among the members of the Federal Open Market Committee (FOMC).
Joint work with Lars Backstrom, Justin Cheng, Eunsol Choi, Cristian Danescu-Niculescu-Mizil, Jon Kleinberg, Vlad Niculae, Bo Pang, Jennifer Spindel, and Chenhao Tan.
The Dynamics of Choice Imitation as Revealed by Baby Names
Indiana University Bloomington, USA
Learning Curves From Game Data
University of Sheffield, United Kingdom
Data from players of online and offline games allow the history of players actions to be connected to their eventual level of performance. We data mine game data to recover this "natural history" of people's learning, testing and extending old theories from experimental psychology, whilst being forced to adapt analytic methods to cope with abundant rather than scarce data.
Language and Mood in Social Media
A number of social media studies have equated people's emotional states with the frequency with which they use affectively positive and negative words in their posts. We explore how such word frequencies relate to a ground truth measure of both positive and negative emotion for 515 Facebook users and 448 Twitter users. We find statistically significant but very weak (ì in the 0.1 to 0.2 range) correlations between positive and negative emotion-related words from the Linguistic Inquiry Word Count (LIWC) dictionary and a well- validated scale of trait emotionality called the Positive and Negative Affect Schedule (PANAS). This suggests that for the typical user, dictionary-based sentiment analysis tools may not be sufficient to infer how they truly feel.
Data on the Mind: A Community Resource for Naturally Occurring Data in Cognitive Science
University of California, Berkeley, USA
Cognitive scientists are increasingly interested in using big data and naturally occurring datasets to understand human behavior and cognition. These large-scale and often messy datasets can offer unique insights into real-world behavioral and cognitive dynamics, but working with them requires a new mindset and an expanded toolkit. Data on the Mind is a community-focused initiative to help cognitive scientists meet the unprecedented challenges and opportunities of big data and naturally occurring datasets. In this talk, I introduce Data on the Mind and present some projects on real-world data that have come from it.
Widening the Scope of Big Data: The Importance of Multi-Dimensional Analyses for the Understanding of Human Language and Behavior
Arizona State University, USA
The amount and types of data that can be collected on our behaviors is growing at a rapid rate. From the text messages we send to our friends to the articles we share online, we are constantly generating data that can be used to define, constrain, and predict our behaviors. Discussions surrounding big data often focus on the amount of data available to be analyzed (i.e., large corpora of forum posts); however, the numerous types of information that can be extracted from individual data points are less frequently emphasized. In this talk, Dr. McNamara will describe how we can glean important insights into the nature of human behavior by employing big data approaches from a multi-dimensional perspective. In particular, she will focus on research related to discourse comprehension and production and the insights that can be gleaned from analyzing these complex processes from multiple angles. Examples will be provided that describe how information collected from complex data and analytic techniques can be used to enhance psychological theories, as well as real-world issues such as learning.
Examining Individual Differences in Cognition in Three Different Online Populations
University of Wisconsin-Madison, USA
I'll be presenting three projects making use of different online populations. The first comes from a MOOC on Video Games and Learning where students participated in two short perceptual/cognitive tasks as part of their homework for the course. Here we examined the association between previous video game experience and skill on the tasks. The second comes from a study using MTurk participants. Here we assessed individual difference predictors of learning and learning generalization. The third (still in progress) examines individual difference level predictors of expertise development in the online video game League of Legends. In addition to discussing the empirical results, I'll also cover some of the methodological challenges and our current solutions (e.g., keeping visual field of view constant across participants despite wide variation in monitor size; bringing participants back across sessions on MTurk to measure learning/learning generalization; recruitment issues; etc).
What Can Children Learn from Six Million Words?
University of California, Riverside, USA
How does the structure of speech to children contribute to the development of semantic knowledge? A full investigation of this question requires two things: a realistic model of the speech that children hear, along with a well-specified model of the learning and memory system. To this end, we constructed a model of the development of semantic memory using a corpus of 6 million words of naturalistic child-directed speech, which was used as input for a recurrent neural network model that learns word meanings as a function of the distributional statistics of language. This model revealed many interesting things. First, a model of this sort learns very well, containing highly-detailed and structured semantic knowledge. Second, the model shows that a number of features of semantic development emerge from the learning of distributional statistics, such as progressive differentiation of concepts into hierarchical structure, a bias for taxonomic relationships, and robust differences between artifacts and natural kinds. Third, the model shows that the nature of child-directed speech matters significantly, as experiments that manipulated the structure and sequence of the corpus had significant effects. Finally, the model provides insight into differences in meaning that emerge from language statistics, compared to more grounded sources of meaning. This research demonstrates that the structure of speech to children, coupled with a prediction-based learning and memory system, is useful for the construction of a robust semantic memory, and that many classically-noted properties of semantic memory emerge naturally from the interactions in such a system.