A couple months ago, I had to opportunity to meet with a group of five researchers at University College London (UCL), who are currently using Prolific for their research. Our aim was to discuss one overarching question: “Where do you think participant recruitment is headed in your field?”
The conversation started with a round of introductions in the office of Dr. David Shanks, the Head of Division of Psychology and Languages at UCL. Joining us in Shanks’ office were Tom Hardwicke, PhD candidate in experimental psychology, Caterina Paolazzi and Giulio Dulcinati, both PhD candidates in the field of linguistics, and Dr. Dave Lagnado, senior lecturer and experimental psychologist.
To start with, the group described the study designs they often employ. For instance, Paolazzi uses prescreening filters to target specific groups of participants to read and rate sentences for her research on syntax and morphology. Dulcinati usually implements performance-based rewards for his research on the role of cooperation in communication. Lagnado runs one-off studies that measure participants’ reaction times as part of his investigation into the psychological processes that underlie human learning. Hardwicke, on the other hand, frequently works with longitudinal study designs. As part of his research on human memory, he has run studies that instruct participants to come back at precise time slots on multiple days.
The researchers in the group mainly collect data in the lab or online. How do the researchers feel about these two different data sources? We discussed concerns regarding speed of data collection, diversity of participant pool, and data quality.
For most in the group, the move towards recruiting online was motivated by the need to collect greater volumes of data rapidly. In Paolazzi’s words “I started using [Prolific] because he [Dulcinati] told me that in four hours you will have your data. And this changes everything because usually it takes me two weeks [in the lab] to get my data and I think this is a big incentive for people [to recruit online]”. Paolazzi also emphasized how speed of data collection could influence both the quantity and quality of studies she runs during her PhD: “For us [PhD candidates], we only have three years to run a bunch of [studies].” It is thus an advantage if each study can be completed faster. Paolazzi adds that speedy data collection makes it “more likely that you will run more pre-tests – stuff that you would otherwise cut.” This consequently could mean better science.
Apart from accelerating the speed by which new scientific research progresses, Lagnado noted that online recruitment has the additional value of making it easier to reexamine past findings in one’s field: “You can revisit a lot of old issues in psychology when people just didn’t have these facilities. Before you had to bring people into the lab. Now you don’t need to do that and it’s a bit of a game changer in terms of what you can explore.”
In sum, when it comes to speed of data collection, online recruitment is unmatched and this is probably the most important incentive for researchers to recruit online rather than in the lab.
There are considerably less restrictions in signing up as a participant on online recruitment platforms versus university labs. Naturally, this means researchers can find more diverse groups of participants online compared to in the lab. This factor of sample diversity has become very desirable in research. As Lagnado remarked, “You don’t want just students all the time, you want to sample the general public.” Indeed it has been observed that samples collected in university labs are usually homogenous with respect to demographic characteristics such as race, social-economic status, and education level. While homogenous samples can certainly be useful in reducing noise in one’s data, there is a special balance to maintain as a lack of heterogeneity can mean that observed effects are not replicable in the general population. In the field of psychology, sample heterogeneity is a salient topic as researchers begin to question the external validity of research findings solely based on WEIRD population samples – that is, samples from Western, Educated, Industrialised, Rich, and Democratic societies. Online recruitment platforms can provide a remedy to the issue of limited diversity in university lab samples.
Data quality concerns have sometimes held some researchers and journals back from trusting the fidelity of online studies. When recruiting online, researchers have zero physical contact with participants so how can they verify their identities? And how can researchers verify that participants are giving the study their undivided attention? In a lab setting, it is perceived that researchers have more direct control over the activity of their participants thus some may arrive at the conclusion that the quality of data obtained online is poorer than what is collected in labs. However, in reality this is hardly the case. Shanks remarked that “There are no cases I am aware of where you clearly see qualitative differences [between data collected in the lab and online]”. Shanks is not alone in his conviction that online samples are just as reliable as lab samples – there are an abundance of published studies that quote the same observation (e.g., Buhrmester, Kwang, & Gosling, 2011; Crump, McDonnell, & Gureckis, 2013; Paolacci, Chandler, & Ipeirotis, 2010).
While it can be said that ensuring high data quality is more of a challenge in online samples, there are preventative steps that online recruitment platforms and researchers can take to identify the ‘bad apples’ in the participant pool. For example, at Prolific we prevent duplicate participants by using a number of tracking mechanisms (including IP addresses). Moreover, it is a requirement for our participants to verify their accounts via phone or Facebook. We additionally provide researchers with the option to review each and every submission before rewarding participants. Next to these actions we take as a platform, we also advise researchers to take their own measures by, for instance, incorporating instructional manipulation checks (IMCs) in their studies. IMCs can be used to determine whether or not a participant paid attention during a study. Overall, we believe this joint effort in maintaining data quality has driven down the number of ‘bad apples’ to a minimum. In short, the current state of affairs is that data quality is not compromised in online studies and as long as we continuously work to weed out ‘bad apples’, this will remain the status quo.
If it is the case that online recruitment is more efficient, offers more diverse samples, and provides equally reliable data compared to recruiting in the lab, then why don’t researchers move away from lab samples completely? One reason is that not all study designs are currently possible online – or at the very least not without extensive programming knowledge. However, we believe that this is changing. Take, for instance, the new startup Cognilab (currently in private beta): Soon Cognilab will officially release software that will allow researchers to create complex cognitive experiments (incl. precise reaction time measurement) without having to write a single line of code.
Study design possibilities aside, the UCL researchers agreed that although there is increased interest in running studies online rather than in the lab, it is likely in the next couple of years that researchers will continue to use both lab and online samples. In fact, Hardwicke likes to use both sources in tandem. He often pilots in the lab in order to have some face-to-face interaction with his participants. The feedback he garners from these face-to-face interactions can often highlight improvements that can be made before he publishes his study online to test a larger pool. Ultimately it doesn’t have to be an either/or situation – both data sources have their pros and cons. And using both data sources in parallel can have its advantages.
Buhrmester, M., Kwang, T., & Gosling, S. D. (2011). Amazon’s Mechanical Turk a new source of inexpensive, yet high-quality, data?. Perspectives on psychological science, 6(1), 3-5.
Crump, M. J., McDonnell, J. V., & Gureckis, T. M. (2013). Evaluating Amazon’s Mechanical Turk as a tool for experimental behavioral research. PloS one, 8(3), e57410.
Henrich, J., Heine, S.J., Norenzayan, A. (2010). The weirdest people in the world? Behavioral and Brain Sciences, 33, 61-83.
Paolacci, G., Chandler, J., & Ipeirotis, P. G. (2010). Running experiments on Amazon Mechanical Turk. Judgment and Decision making, 5(5), 411-419.