See, I said I would post soon!
Add this to the AOL leak: Someone ran across a collection of MySpace passwords, badly hidden on a phishing server. Again, a really interesting dataset (and no, he isn’t making the data available!), but tinged by the absolutely unethical and illegal method of collection. Of course, you could reasonably ask whether the passwords collected by a phishing attempt represent the average MySpace password. I seriously doubt that is the case.
If this becomes common, the difficulty is manifest. Researchers who follow even a small degree of ethical behavior will be left in the dust by “amateurs” (in the kindest sense of the word) and professionals (in the least kind sense) who do not recognize the ethical problems of making use of data that has been taken against the wishes of its owners. We’ve already seen this: I’ve posted about issues of scraping social network sites and the AOL data. But is this the future of online research: a sea of questionable datasets, traded on the black market, and unavailable to researchers who would most benefit from them?
Most difficult is that it is clear that the blogger posting above has arguably brought no harm to the individual users. He has analyzed the work in the aggregate and not revealed anything that directly impacts most users. Moreover, I don’t think he did much to violate their trust: the phishers did that, and then just left the data lying around. Nonetheless, I think this represents another case in which the researcher has to close her eyes to it and just say no. It’s not an easy thing to do, though.
It raises another issue. While professionally, we clearly would be bound from using the data in–say–a publishable paper, what about blogging it? On the AOL data, I decried its invasion of privacy, and then turned around and blogged about it. I think it’s also clear that distributing via blog is no less damaging than in a research journal. Does that make my earlier post ethically questionable? What a mess.