[The following was almost liveblogged, but the computer ran out of juice just before it was ready to publish. I am at the University of Sussex, at the Association of Internet Researchers annual conference, and it seems like access to the Internet remains bad every year. Hopefully, next year in Chicago will be better. Over the next few days, I’ll try to get up a backlog of posts. Also, if you are waiting on grades from me, or other responses, I’ll be catching up this weekend when I get back to the states.]
I am now just coming off a short lunch break at the meeting on web archiving in the board room of the British Library. I am not much one for the whole “liveblogging” thing, as I’ve noted before. I like to give things a chance to stick. Unfortunately, since I had to walk off an airplane to this meeting (I could really use a shower!), so I’m afraid I might forget some of this. There is a great group of researchers here, and some interesting ideas being passed around. I would have thought my blog-centric views on archiving would have failed to find an audience among the library-centric folks here. There is, in fact a difference in the way librarians and more blogcentric people think about archiving, but there is more interest in sharing ideas than I might have expected. The major difficulties, if any, seem to revolve around vocabularies (e.g., what constitutes “metadata”). Many of the ideas I presented in my talk had already come out in some form. The biggest thing people seemed interested in was furl.
Lots of people presented their ideas, I probably should go into more detail here, but I’m not going to. Steve Schneider talked a bit about his experience with coordinating thematic archives, and systems that facilitate this process. Paul Koerbin talked about the Pandora project. Lots of good comments from folks. Pierre Levy once again presented on his ideas towards a universal semantic category system.
After lunch, we are discussing some of the “use cases” on which he national archive will work. Quickly, we get into some pretty wild requests. This is an interesting approach: demonstrating the cases and getting feedback rather than talking about specifications. Frankly, though, the front end is less important to me than revealing the internals.
The main hangup seems to be how to select what should be archived. It’s a little like “what three records would you take with you if you were going to be stranded on an island?” What on the Web really matters enough that the national libraries should be going out and saving it. Steve Schneider, later at the conference, noted that the question was really between prospective needs of scholars, and retrospective needs. The retrospective needs are really important, but extremely hard to predict. My opinion is that the best thing to do is be very receptive to what scholars want archived now and hope that this also counts toward the future.
A comment by Torill Mortensen got us talking about archiving not just the web, but the total experience of the web. At a very basic level, this means holding on to the software that is being used, either by storing machines or emulating them. But she was particular interested in the question of how to document the kind of information and social ecology that exists around the web. Many of us today have been using broadband exclusively for so long we can’t even imagine dial up speeds (or the old 110/300 switchable bps modems). We need to be able to understand what the web was, not just what it contained.