Sampling from

Obtaining a sample of blogs is–as with sampling on the web in general–not as easy as it should be. One way I’ve done this in the past is to sample from a “ping server.” Most weblog systems and services send pings out to updating services to let them know that new content has been posted. One of these servers is located a Moving forward, I hope to use, now that it has been bought up by Yahoo and is back in business. They’ve recently OKed my access, but I need to change my system a bit to draw pings from that server.

The last time I pulled from, it worked pretty well. Basically, I am working with a couple of other people to content analyze a relatively representative sample of weblogs. So, if I gather all the pings from a week from, and pull random blogs from that, I should have a decent sample. I have a couple of restrictions: that it be apparently single-author, that it be written primarily in English, and that it not be primarily commercial in purpose. This necessitates sifting through the sample by hand, and I’m doing this 100 blogs at a time.

I was prepared for the number of splogs (spam blogs). It doesn’t take much to notice that spam has infested the blogosphere. I was less prepared for the number of Asian-language blogs, particularly Chinese.

I kept an informal count of the last 100 I looked at. Of these, only 23 met the requirements I laid out above. 44 were splogs, 20 were written primarily in Chinese, and 12 in another language (four in Japanese, a couple in Thai, a couple in Farsi, one Russian, one Portuguese, and two in Spanish). I think it would be a mistake to extend this and suggest that 20% of the blogosphere is Chinese, but from an informal, personal perspective, the increase in Chinese-language bloggers is striking.

This entry was posted in Uncategorized and tagged . Bookmark the permalink. Post a comment or leave a trackback: Trackback URL.


  1. Posted 4/10/2006 at 3:53 pm | Permalink

    Wow that’s really useful. Are willing to give other academics access on similar terms? Where do I sign? :-)

    Also, do you know which blogging platforms ping as a default setting for all posters?

  2. Posted 4/10/2006 at 4:00 pm | Permalink

    Oh, I think they take all comers. You just have to have a static IP that you are coming from. Depending on how nice your university is about doing that, it’s not a huge problem, though I’ll say that with more certainty once I have it up and running. has the advantage of being a “meta” ping server, collecting pings from and others. I know the major blogging hosts ping, but for self-hosted blogs, you usually have to indicate where you want it to ping to.

  3. Pranam Kolari
    Posted 4/11/2006 at 7:05 pm | Permalink

    We made a fairly formal study on last December. Our estimate was that close to 75% of pings from English blogs were actually spings, those from splogs. Those numbers seems to be close to your manual analysis of around 44/67. Check our post on splogs, if you get a chance.

  4. Posted 4/11/2006 at 7:35 pm | Permalink

    Actually that was 44 of 100, but it’s possible (a) this was an outlier, (b) I didn’t include splogs that were not in English, (c) I didn’t include those without entries–they were elsewhere, and (d) some slipped through and will be caught later.

    Interesting stuff there, by the way, especially concerning language. Though I suspect is fairly Anglo-biased for pings (as Kevin suggests), no?

Post a Comment

Your email is never published nor shared. Required fields are marked *


You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>