WebMynd and distributed archiving

Techcrunch has a writeup on the startup WebMynd, a Firefox add-on that records each website you visit to your hard-drive. The idea is that you have something of a personal archive. It’s free to archive for a week, and $20 to archive for a year.

This is a great idea, of course–after all, I’ve advocated for it for years. I recognize the effort for the commercial nature here, but this is crying out for an open alternative. Why?

With a small addition, it solves some major problems with archiving the web:

1. Selection. Obviously, those pages that are visited most frequently will be most frequently archived. It’s not the case that only the popular material is worth archiving or caching, but because it requires no hand-selection–or rather that selection is invisible–it is a good alternative.

2. Bandwidth. No crawlers but humans. Bandwidth isn’t a huge concern for the web hosts (though a large part of my traffic is bots), but it certainly is for the archiver, who has to try to suck the web down a single pipe.

3. Storage. Yes, storage has come down in price, but it’s still expensive. To paraphrase The Streets, a zettabyte don’t come for free.

What this means is that you would need to set up an infrastructure for sharing individual indexes, and then allow for P2P archive searches. It’s not a simple problem, but it is definitely a tractable one. The result would be a kind of holographic memory of the web: knock out half the internet, and it would still exist on the hard-drives of the remaining accessible client computers on the web.

    Hi Alex, thanks for your interest in WebMynd!

    To address a couple of the points you raise: truly distributed indexing and P2P retrieval is actually something we tossed around as a potential approach ourselves..

    There are a few reasons that we haven’t gone down that route. Firstly, as you rightly point out, it’s not a trivial problem to solve, and one we’d prefer to tackle once we have a more basic service available and working. Secondly, it would be a much more heavyweight install for our users compared to a small Firefox extension. Thirdly, we have some pretty exciting features in the pipeline to do with social browsing, intelligent categorisation and sharing – all things which really require a more centralised architecture.

    Also, you raise the possibility of an ‘open alternative’. Once this initial surge of users has become slightly less intense, we full plan to stablise, formalise our APIs to expose communal data, perhaps even opt-in individual data too. Obviously, if we go down the route of fatter client agents for P2P, it absolutely makes sense to make the APIs public in that case too.

    Thanks again from the WebMynd team!

    It’s a great idea… I’ve ranted about web browsers needing a visual way of retracing our surfing habits (even as web site thumbnails), but this seems more useful… like a form of residual memory (you referred to it as “holographic”). A web browsing version of Google Gears?

