Techcrunch has a writeup on the startup WebMynd, a Firefox add-on that records each website you visit to your hard-drive. The idea is that you have something of a personal archive. It’s free to archive for a week, and $20 to archive for a year.
This is a great idea, of course–after all, I’ve advocated for it for years. I recognize the effort for the commercial nature here, but this is crying out for an open alternative. Why?
With a small addition, it solves some major problems with archiving the web:
1. Selection. Obviously, those pages that are visited most frequently will be most frequently archived. It’s not the case that only the popular material is worth archiving or caching, but because it requires no hand-selection–or rather that selection is invisible–it is a good alternative.
2. Bandwidth. No crawlers but humans. Bandwidth isn’t a huge concern for the web hosts (though a large part of my traffic is bots), but it certainly is for the archiver, who has to try to suck the web down a single pipe.
3. Storage. Yes, storage has come down in price, but it’s still expensive. To paraphrase The Streets, a zettabyte don’t come for free.
What this means is that you would need to set up an infrastructure for sharing individual indexes, and then allow for P2P archive searches. It’s not a simple problem, but it is definitely a tractable one. The result would be a kind of holographic memory of the web: knock out half the internet, and it would still exist on the hard-drives of the remaining accessible client computers on the web.
WebMynd and distributed archiving
Techcrunch has a writeup on the startup WebMynd, a Firefox add-on that records each website you visit to your hard-drive. The idea is that you have something of a personal archive. It’s free to archive for a week, and $20 to archive for a year.
This is a great idea, of course–after all, I’ve advocated for it for years. I recognize the effort for the commercial nature here, but this is crying out for an open alternative. Why?
With a small addition, it solves some major problems with archiving the web:
1. Selection. Obviously, those pages that are visited most frequently will be most frequently archived. It’s not the case that only the popular material is worth archiving or caching, but because it requires no hand-selection–or rather that selection is invisible–it is a good alternative.
2. Bandwidth. No crawlers but humans. Bandwidth isn’t a huge concern for the web hosts (though a large part of my traffic is bots), but it certainly is for the archiver, who has to try to suck the web down a single pipe.
3. Storage. Yes, storage has come down in price, but it’s still expensive. To paraphrase The Streets, a zettabyte don’t come for free.
What this means is that you would need to set up an infrastructure for sharing individual indexes, and then allow for P2P archive searches. It’s not a simple problem, but it is definitely a tractable one. The result would be a kind of holographic memory of the web: knock out half the internet, and it would still exist on the hard-drives of the remaining accessible client computers on the web.
Share this: