Over the last several years, a number of researchers have written about the hyperlinked structure of the web and how it changes over time. It appears that the natural tendency of the web (and of many similar networks) is to link very heavily to a small number of sites; the web picks winners. Or, to be more accurate, the collective nature of our browsing picks winners. As users forage for information, they tend to follow paths that are, in the aggregate, predictable.
Huberman notes that not only are the surfing patterns of the web regular, the structure of the web itself exhibits a number of regularities, particularly in its distribution of features. The normal distribution of features found everywhere–the bell-shaped curve we are familiar with–is also found on the web, but for a number of features, the web demonstrates a “power law” distribution. George Kingsley Zipf described a similar sort of power law distribution (Zipf’s Law) among words in the English language, showing that the most frequently used English word (“the”) appears far more often than the second-most frequently used word (“a”), which appears far more often than the third-ranked word, and so on (Yes, yes, I know: Zipf is ranks and so it’s different, but not different enough to matter for this discussion.) This distribution–magnitude inversely proportionate to rank–has shown up in a number of places, from the size of earthquakes to city populations.
The number of “backlinks,” hyperlinks leading to a given page on the web, provides an example of such a distribution. If the number of backlinks were distributed normally, we would expect for there to be a large number of sites that had an average number of backlinks, and a relatively small number of sites that had very many or very few backlinks. For example, if the average page on the web has 2.1 backlinks, we might expect that a very large number of pages have about two backlinks, and a relatively small number to have one or zero backlinks. In practice, a very large number of pages have only a single backlinks, a much smaller number of two backlinks, and an again much smaller number have three backlinks. The average is as high as 2.1 because of the small number of sites that attract many millions of backlinks each. Were human height distributed in a similar fashion, with an average height of, say, 2.1 meters, we would find most of the globe’s population stood under a meter tall, except for a handful of giants who looked down at us from thousands of kilometers in the sky.
Huberman notes that this distribution is “scale-free”; that is, the general nature of the distribution looks the same whether you are examining the entire World Wide Web, or just a small subset of pages. I have been blogging for several years, and each blog entry ends up on its own page, often called a “permalink.” I took a look at the last 1,500 of my posts, to see how many backlinks each one received. The first figure to the right shows a ranked distribution of incoming links, not including the first-ranked posting. The vast majority (1,372) of these 1,500 pages do not have any incoming links at all. Despite this, the average number of backlinks (=”inlinks” in the figure) is 0.9, driven upward by the top-ranked posts. Incidentally, as the second graph shows, the number of comments on each of these entries follows a similar distribution, with a very large number of posts (882) receiving either a single comment or none at all. In order to make these figures more legible, I have omitted the most popular post, entitled “How to Cheat Good,” which was the target of 435 backlinks by August of 2007, and had collected 264 comments.
One reason to explain why such a distribution exists is to assume that there were a few pages at the beginning of the web, in the early 1990s, and each year these sites have grown by a certain percentage. Since the number of pages that were created has increased each year, we would assume that these older sites would have accumulated more links over time. Such an explanation is as unlikely on the web as it is among humans. We do not grow more popular with every year that passes; indeed, youth often garners more attention than age. There are pages that are established and quickly become “hits,” linked to from around the web. While it cannot explain the initial rise in popularity, many of these sites gain new backlinks because they have already received a large number of backlinks. Because of the structure of the web, and the normal browsing patterns, highly linked pages are likely to attract ever more links, a characteristic Huberman refers to as “preferential attachment.”
Take, for example, my most popular recent posting. The earliest comments and links came from friends and others who might regularly browse my blog. Some of those people linked to the site in their own blogs. Eventually, it came to the attention of several widely read and popular blogs, including Michael Froomkin’s “Discourse.net” and Bruce Schneier’s “Schneier on Security.” Someone noticed it on the latter blog, and a link was posted to it from “Boing Boing,” a very popular site with millions of readers. Naturally, many people saw it on Boing Boing and linked to it as well, from their blogs and gradually from other web sites. Eventually, I received emails telling me that the page had been cited in a European newspaper, and that a printed version of the posting had been distributed to a university department’s faculty.
It is impossible for me or anyone else to guess why this particular posting became especially popular, but every page on the web that becomes popular relies at least in part on its popularity for this to happen. The exact mechanism is unclear, but after some level of success, it appears that popularity in networked environments becomes “catching.” The language of epidemiology is intentional. Just as social networks transmit diseases, they can also transmit ideas, and the structures that support that distribution seem to be in many ways homologous.