George wants to look at hyperlink networks between countries, using AltaVista. Others have done this, but I was very resistant. It seemed to me that AltaVista probably had significant deficits in terms of non-English pages, as well as coverage generally. Furthermore, I wasn’t sure you could trust the “we found 34,222,534 links” stuff. Further yet, looking only at ccTLDs seemed to really miss the boat, since most new .com registrations are from outside the US. So, I figured: more power to George. But he has been doing a lot of work to show that the approach obtains reasonable levels of reliability and validity.

Then, I found out he was doing these checks by hand–or rather one of his grad students was. Argh! So I wrote a short script to do this for him. You just provide a plain text file with a list of the domains you want to search on linkages for (one domain each line), and it returns a comma-separated file with a matrix indicating in each cell the number of links from one domain to another. George is using this for country-to-country measurements, but there is no reason it couldn’t be used at at other levels, to see how Buffy the Vampire Slayer sites were internetworked, for instance. (I was planning on linking, but it seems it has been defaced by muslim Bush-haters, who apparently think they will reach the Republican elite by hacking a Buffy site…)

If you have python installed on your system, the source is here: If you are on a Windows system without python, download and unzip

This entry was posted in Uncategorized and tagged . Bookmark the permalink. Trackbacks are closed, but you can post a comment.

Post a Comment

Your email is never published nor shared. Required fields are marked *


You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>