Catching the same fish twice: How wide is the MTurk net?
Tuesday, June 28, 2016
I sometimes use MTurk for running experiments. As an old-timer, there is something of a magical quality to MTurk. You post your study and only a few hours later you receive data from perhaps one thousand or so people. I can remember when we used to meet the people participating in our studies face-to-face, with experiments run one person at a time, and time for chatting with participants after the study. It was slow, but then again, you might catch a serendipitous finding!
The kind of speed MTurk offers is extremely appealing of course. Lots of us are using MTurk now. Last year I estimated that there are many thousands of journal articles with results from the MTurk population, with a doubling of the total in the last year or so. MTurk is especially appealing for very short duration studies where each person completes a simple task in perhaps five minutes or less. Cost wise, it is not necessarily a lot cheaper but the researcher time saved is often significant. (You are all paying an acceptable wage, right? A a good chunk of MTurk participants have MTurk as their primary source of income.)
Many key experiments have been replicated with the MTurk population, with classic findings replicated in Psychology, Economics, and Political Science. And there has been research into factors that affect the quality of the data collected and the use of catch questions to this end, which we reported on this blog earlier.
But who are these MTurk workers? We already have a fairly good idea. Paolacci and Chandler recently summarized what is known, primarily from surveys of the workers run on MTurk. Workers are from around the world, tend to be young, have a different ethnic mix than their home country, and higher levels of education. There are also psychological differences, with MTurkers a little more introverted and neurotic. They are also often doing something else as well as your study, like watching TV. But at least they are less WEIRD than our university participant panels.
And how many MTurkers are there really? I got together with six other researchers from around the world to use a method from ecology called capture-recapture analysis to estimate the reach of each of our laboratories. The logic is simple. On the first day you catch, tag, and release fish, say fifty. On the second day you catch fish again. Say you catch 40 and one quarter are tagged. If you know 50 fish are tagged from the first day and that one quarter of fish are tagged from the second day, you can infer that there are 200 fish in the lake. By using information about who comes back to our experiments, we estimated that the average laboratory (so perhaps yours) can reach a population of about 7,300 people on MTurk. There is something of a discrepancy here between this figure and the headline 500,000 workers registered, but the discrepancy is not that surprising as we'd expect the distribution of activity to vary widely over the 500,000 with few highly active people and many inactive people.
Running larger batch sizes helps increase your reach, but larger payments shrink your reach. We think that paying more attracts the most prolific workers more quickly, effectively narrowing the pool. Some of our MTurkers were highly experienced; the majority of our participants had taken part in experiments from at least two of our laboratories—and given we are only seven of likely hundreds of researchers, it is a fair bet that many participants have taken part in more experiments than you have every run, or maybe even ever read about.
Why does this matter? Well first of all, we are all sampling from the same MTurk population. In the old days, even the most enthusiastic participant could only take part in experiments on campus. Now the most enthusiastic participant could take part in a very large number of all of the experiments being run from right across the world. This experience can inflate measures on some tasks and reduce effect sizes for popular paradigms, such as economic games like prisoner's dilemma. If you are thinking of including the cognitive reflection test, for example, many of your participants have already done it.
Previous experience on MTurk predicts performance on the "original" cognitive reflection test questions, butnot on a newer version of the test with items MTurkers are less likely to have seen before. And this raises something of a commons dilemma: To some extent we all share the same resource, including a measure in your study may spoil the pool for other researchers. Is it critical that your participants believe your instructions (maybe you are a behavioural economist where deception is pretty much forbidden)? Let's hope they have not just completed a psychology study with a surprise test!
Outside the US we rely upon MTurk Data Consultants to help us access the MTurk platform. There are other alternatives to MTurk too. I've used Prolific Academic before and have replicated findings on both platforms. Or Google for "MTurk alteratives". I wonder what these participants are like? Hopefully the knowledge that MTurk is not the infinite resource it once seemed to be will help other platforms develop.