On Not Scraping Orkut

Liz on Many2Many posts a pointer to this geomapping of orkutsters. I spoke to my class Monday night about the ethical dimensions of getting at that data.

The minute I first signed on to Orkut, I realized that it would be a great source of info to mine for looking at several social sciency questions, so I did what every responsible researcher would do: I emailed Orkut asking for the data. It’s not an issue of invading the privacy of the users–they have clearly published their information to the public. However, Orkut’s TOS forbid scraping. I somehow doubt this map was created by typing the data by hand :). And now, a couple (?) weeks later, he (they) haven’t even bothered responding.

That isn’t really a surprise. I ran into the same thing with Slashdot, and colleagues have faced the same difficulty with Ebay. The only exception I’ve seen was wikipedia, who were very helpful. More irksome is that this data is often released to some researchers, but not others. I understand that, to a certain extent, but it is so against the kind of ethic of sharing in the academic world that it comes as a surprise when the same isn’t true among those running websites–especially those serving the open source community, as in the case of Slashdot.

So, what do I do now. I personally have little ethical difficulty with scraping Orkut. Frankly, it is the same as hiring a student to sit there and transcribe it. I now hear that similar projects that did scrape sites faced no serious impediment from their human subjects panels. So now I feel like I have shot myself in the foot by not collecting that data and mapping it: something I had hoped to do early on.

I’m curious what others think of this. Is there an ethical issue in violating the ToS? Does this make me a bad researcher? What is the legal position? How long will it take after asking this publicly before I am exiled from Orkut? Or better yet, since private emails go unanswered, mightn’t someone from Orkut respond publicly to the issue? (Fat chance?) If I did scrape the data, should it then be made available as a database? Only to other researchers? (The site hosting this data makes me wonder if such questions are moot. I somehow doubt that “datapimp” is very fussy about his customers…)

On Not Scraping Orkut

Share this: