WebMynd and distributed archiving
Sunday, January 27th, 2008Techcrunch has a writeup on the startup WebMynd, a Firefox add-on that records each website you visit to your hard-drive. The idea is that you have something of a personal archive. It’s free to archive for a week, and $20 to archive for a year.
This is a great idea, of course—after all, I’ve advocated for it for years. I recognize the effort for the commercial nature here, but this is crying out for an open alternative. Why?
With a small addition, it solves some major problems with archiving the web:
1. Selection. Obviously, those pages that are visited most frequently will be most frequently archived. It’s not the case that only the popular material is worth archiving or caching, but because it requires no hand-selection—or rather that selection is invisible—it is a good alternative.
2. Bandwidth. No crawlers but humans. Bandwidth isn’t a huge concern for the web hosts (though a large part of my traffic is bots), but it certainly is for the archiver, who has to try to suck the web down a single pipe.
3. Storage. Yes, storage has come down in price, but it’s still expensive. To paraphrase The Streets, a zettabyte don’t come for free.
What this means is that you would need to set up an infrastructure for sharing individual indexes, and then allow for P2P archive searches. It’s not a simple problem, but it is definitely a tractable one. The result would be a kind of holographic memory of the web: knock out half the internet, and it would still exist on the hard-drives of the remaining accessible client computers on the web.
The other course I’m teaching this semester—also distance—is formally titled “Communication, Media, and Society.” Is it only because I am in the field that I think that title might as well be shortened to “Stuff.” It’s hard to think what it doesn’t cover. In past incarnations, I’ve left it to the students to brainstorm a syllabus on the first day, and then largely teach the course to each other. Most (though not all) really liked the outcome of that process, despite initial doubts. There may be a way to do that online as well, but I wanted to provide a little more structure the first time it is taught online.
You may have noticed I have abandoned the idea of going all audio here. I’ve also decided not to echo every post from my grad courses here, but instead to do more periodic posts. And the periodicity will be rather long, since this semester’s classes are set up on a two-week cycle, in order to prepare for the shift from 16-week semesters down to 7-week courses. I know: yikes.