Greg Elmer invited me up to the Toronto to meet with a group of folks (namely Auke Touwslager, Charles Davis, ganaele langlois, Abby Goodrum. Nart Villeneuve, Yuya Kiuchi, and Rob King) interested in various forms of web mapping. We talked about several issues related to crawling, and particular in a new web archiving project that Greg and Rob (among others, I guess) are involved in, called Webivore, that would allow researchers to more easily create continuing archives and share those archives.
One of the issues that came up was what seems to be a continuing de-linking of the web. Commercial sites have always been notorious for having few external links, but this seems to be spreading to other areas too. Some of this probably has to do with Google and search — browsing is no longer the chief way of navigating the web. But it means that hyperlink analysis, already a rather tenuous method in terms of validity, becomes even shakier. This becomes especially clear in the blogosphere where links are becoming increasingly sparse (pdf), as blogrolls fall by the wayside. Those of us who are interested in the structure of the web more and more need to look at the text being supported and other structural indicators, along with the hyperlinks.
Some of that can be done by looking for textual references on a page. For example, Liz Lawley recently referenced my name in a blog post rather than linking to me. By mining out such explicit references, especially to organizations, you might get somewhere. But I suspect that this approach leads, at best, to limited results.
Another possibility, though far more difficult, is to look for indications that posts seem to be talking about the same topic. Explicitly, this is what Technorati tags do, and a careful analysis of how these tags are used would be valuable. But I think even more important is a way of analyzing the implicit links between ideas and people. Tracing out indicators of such connections is an important step, but requires wide availability of data sets and a broader application of analytical tools.