Opening Up Online Research

This is an unedited preprint of an article that appeared as:

Halavais, A. (2011). Open up online research. Nature, 480(8 December), 174-175.

The mass adoption of networked communication and an emerging culture of open sharing have provided a boon for social scientists. They open a new window through which we may observe the experiences of individuals and groups that have been all-too easily ignored by history, and present the opportunity to examine not just a few eyewitness accounts of an event, but thousands. My own work has examined protests on Twitter, the ways in which people learn to become part of an online community through the Digg website, the ways in which political campaigners reach voters online, and the relationship of blogs to the spread of news. The US Library of Congress, which has acquired a database of all tweets since the service began in 2006, announced that they will open access to their historical database, though it is not yet clear when or under what rules. Through social media, social science is entering an age of ‘big science’.

Collecting massive public musings can create ethical dilemmas for the researcher. While these messages are generally publicly available, individuals may not recognize how public they are, or may be surprised when their words are preserved or placed in another context. Some Harvard researchers, for example, collected data from Facebook profiles of Harvard students for a study on friendship and shared interests in 2006. Questions were raised by some about whether this data was really ‘public’ or anonymous. Similarly, America Online (AOL) released a dataset of some 20 million search terms from 650,000 users, also in 2006, for the purposes of research. Although the data was anonymized, journalists and others managed to link some individuals to strings of search terms. The scandal resulted in a class action lawsuit and the resignation of AOL’s chief technology officer.

Working on large-scale public conversation is new ground for many researchers, and for most research ethics committees. What defines ethical research conduct remains blurry. In the case of the Harvard Facebook study, the methodology passed ethical review, but still caused controversy in some circles. Equally troubling is the other side of the coin: many studies have been blocked by ethical review when they presented extremely minimal risk.

Research ethics committees were initially created as a way to review medical research, protect against ethical misconduct, and ensure that subjects are made aware of medical experimentation risks. By the late 1960s, oversight by such boards was applied more broadly to privacy risks in social science research, prompting the famous anthropologist Margaret Mead to argue to the National Institutes of Health that her field did not, in fact, work with “subjects” but rather “informants in an atmosphere of trust and mutual respect” . By the mid-1990s, particularly in the United States , such oversight was applied to groups who had earlier been ignored by these committees: historians, journalists, and folklorists among them. As sociologist Laurie Essig of Middlebury College, Vermont, put it this August in a Chronicle of Higher Education blog post: “IRBs have treated speaking with someone as equivalent to experimenting on them and have almost killed fieldwork in the process”. The US model is becoming the global norm, as ethics boards in Europe and elsewhere begin to review scholarship undertaken in the social sciences and humanities.

The time and expense of intensive ethical review of online social science acts as a brake on such work, both slowing research and restricting the sharing of research data.

Some steps are being taken to resolve these issues, but more needs to be done. Journals and funding agencies can and should enable a vast improvement in the human-subjects system simply by making ethical reviews more public. This will de-mystify the ethical considerations behind such work, enabling researchers to learn from the successes of others, and encouraging both companies and the public to entrust scientists with their personal information. Ethical oversight boards, for their part, should not apply the same standards used for health risks to questions of privacy, particularly when the societal perception of online privacy is in flux.


Any working social scientist will agree that ethical conduct of research is essential, but few will extol the virtues of the research ethics board. War stories abound: the protocol held up because someone on the committee felt the area of research was fruitless, or because of a spelling error, or because it was research that might bring too much controversy to the campus. Even when the issues are more central to the protection of subjects, the standards of approval are often ambiguous and informal. High levels of scrutiny are clearly necessary for a drug trial. But scrutinizing whether questioning gamers about dressing up as their characters for conventions would be traumatizing –to take an example from my students’ research–seems to be an issue best addressed by the researchers with the greatest exposure to the subjects and the culture being examined.

For those of us who research online interactions, it can be especially frustrating to have a board filled with members who have never used Facebook or played World of Warcraft. While human subjects boards can and sometimes do bring in experts who can more directly address the context of the research, this happens more rarely than it should.

Members of review boards tend to be more comfortable with certain methods: for example, the hypothesis-driven experiments of the psychology lab rather than the inductive work done by ethnographers. The decisions of such boards seem to be idiosyncratic, and, by extension, capricious. This is particularly true when multi-site research is approved by several boards, but held up for changes by a few. In the case of a colleague, each of two IRBs insisted on having the other approve a protocol first. This can easily lead to research gridlock, and has spawned a growing industry of professional ethics review expediters.

There have consistently been calls for improving the human subjects review process. The Institutional Review Blog ( chronicles the excesses of IRBs in the humanities and social sciences, for example. The largest organization of ethics review professionals, Public Responsibility in Medicine and Research, provides a venue for discussing ways of improving the process and making it more effective. Perhaps most promisingly, the federal agency that oversees human subjects boards in the United States has recently announced their intention to revise rules for research that exposes subjects to minimal risks. This work is still in the early stages of gathering public comment. It looks likely that it will reduce many of the burdens currently imposed on researching open discourse on the web – such as studying blog posts or tweets. But it would not help to make standards of privacy clearer to researchers, students, or those who host participatory websites.

After researchers battle through the IRB process, the result, an IRB approval form, is usually tucked away in a drawer. Students are inducted into this secretive process often via a cursory overview of ethics in a methods course, and sometimes by being assigned to handle the IRB process for a project. Both good and bad models of ethical research are difficult to come by for students, particularly when they tread less trammeled ground: ethnographic research of virtual communities, for example.


The solution is not to do away with the IRB, but rather to make amendments that render its dysfunctions less acute.

The first step is to agree on a reasonable threshold for when IRBs should become involved – while a minimum standard is set by federal oversight bodies, institutions have some discretion on their specific policies. Some universities, in an overabundance of caution, require work that incurs the most minimal risk to undergo review by an ethics committee before it can commence, often leading to months of delays. Research of texts— such as tweets or blog entries—is considered subject to ethics oversight only by some committees. This lack of clarity and consensus results in unnecessary oversight, taxing boards that could better spend their time on work that represents significant risk to vulnerable subjects. When risks are to adults’ privacy only, for example, pre-review of protocols by an oversight committee is not necessary.

A complementary important step is to ensure that upcoming students, in all fields, have adequate ethical training. This will help to ensure that research not requiring IRB approval still has ethical thought behind it, and discourage the notion that handing a proposal over to an IRB excuses researchers from considering ethics themselves. Independent ethical thinking should begin at the stage of a study’s design and continue on through its completion and publication, rather than constituting a single bureaucratic hurdle.

The greatest problem faced by the ethics system is its secrecy. Review boards must make decisions with limited access to previous cases. Once their decision is made, the outcome is often lost to other IRBs or researchers who could make productive use of the precedents – particularly in new areas of research like online social science, where the review boards often have little experience and would benefit most from the experience of others. One solution would be to require IRBs to be transparent in their decision-making. This, however, seems unlikely to succeed. The ethics review board is conservative by design; we must look elsewhere for change.

Most government funding agencies and private foundations prominently promote open data sharing and collaboration. Were they to make the open publication of IRB protocols or ethical reflections a requirement for receiving funding, it would provide a crack in an otherwise too-secretive process. Many journals expect that social science research is inspected by an IRB before submission; a few go so far as to require authors to sign a statement of IRB approval. None require that the approved protocol submitted for ethics review be provided to the journal or published. They should. Although some information would need to be redacted from these materials, including anything that might violate the privacy of subjects or of the researchers, this material would throw open a new window on ethical considerations.

Enacting these changes would not only grease the wheels of social science research, but also help to convince the public that the work is both important to their well-being and being done in a trust-worthy way. That in turn might make companies more willing to share their data with science. At present, it is difficult for researchers to engage in study of large datasets from Facebook and Twitter, for example, without entering into a research partnership with someone within those companies; the sites’ terms of use prevent outsiders from ‘scraping’ large datasets.

By moving the consideration of ethical conduct beyond the localized IRB to the wider research community, we can evolve the kinds of standards and best practices that can serve to instruct not just the scholarly world, but the wider realms of government policy and corporate practice.

Further reading:

Charles Ess, Digital Media Ethics, Cambridge: Polity, 2009.

Zachary M. Schrag, Ethical Imperialism: Institutional Review Boards and the Social Sciences, 1965—2009, Baltimore: Johns Hopkins University Press, 2010.

Laura Stark, Behind Closed Doors: IRBs and the Making of Ethical Research, Chicago: University of Chicago Press, 2011.