See, I said I would post soon!

Add this to the AOL leak: Someone ran across a collection of MySpace passwords, badly hidden on a phishing server. Again, a really interesting dataset (and no, he isn’t making the data available!), but tinged by the absolutely unethical and illegal method of collection. Of course, you could reasonably ask whether the passwords collected by a phishing attempt represent the average MySpace password. I seriously doubt that is the case.

If this becomes common, the difficulty is manifest. Researchers who follow even a small degree of ethical behavior will be left in the dust by “amateurs” (in the kindest sense of the word) and professionals (in the least kind sense) who do not recognize the ethical problems of making use of data that has been taken against the wishes of its owners. We’ve already seen this: I’ve posted about issues of scraping social network sites and the AOL data. But is this the future of online research: a sea of questionable datasets, traded on the black market, and unavailable to researchers who would most benefit from them?

Most difficult is that it is clear that the blogger posting above has arguably brought no harm to the individual users. He has analyzed the work in the aggregate and not revealed anything that directly impacts most users. Moreover, I don’t think he did much to violate their trust: the phishers did that, and then just left the data lying around. Nonetheless, I think this represents another case in which the researcher has to close her eyes to it and just say no. It’s not an easy thing to do, though.

It raises another issue. While professionally, we clearly would be bound from using the data in–say–a publishable paper, what about blogging it? On the AOL data, I decried its invasion of privacy, and then turned around and blogged about it. I think it’s also clear that distributing via blog is no less damaging than in a research journal. Does that make my earlier post ethically questionable? What a mess.

More “found data”

By alex | Published: 9/16/2006

This entry was posted in Uncategorized and tagged Privacy, Research. Bookmark the permalink. Post a comment or leave a trackback: Trackback URL.

One Trackback

By Go phish: Tom gets annoyed at FactoryCity on 1/23/2007 at 4:39 pm

[…] Thanks for clearing that up. I’m glad you’ve got things under control and are growing the service. It’s not like this is the first time those phishy people have attacked us. […]

More “found data”

One Trackback

Post a Comment

Search

Tweets

Meta

More “found data”

Share this:

One Trackback

Post a Comment

Search

Tweets

Meta