Anjo Anjewierden, at the University of Amsterdam, has been doing some very cool processing of blog text to help provide a textured, understandable view of a given blog within the context of a set of weblogs. First check out his approach to visualizing blogs as “Visual Settlements” along a number of dimensions. Naturally, such a visualization tends to focus on what the designer finds interesting, but it does provide a way of visually comparing and contrasting different styles of blogging. As a heuristic, it is certainly an interesting way to get at some of the qualitative differences among blogs.
I am reminded of something I recently read from Benoît Mandelbrot, as part of a collection of responses to the question “What Do You Believe Is True Even Though You Cannot Prove It?”
Wandering through the frontiers of the sciences, and the arts, I have always trusted the eye while leaving aside the issues that elude it. It can mislead — of course — therefore I check endlessly and never rush to print.
Meanwhile, for over fifty years, I have watched as some disciplines exhaust the “top down” problems they know how to tackle. So they wander around seeking totally new patterns in a dark and deep mess, where an unlit lamp is of little help.
But the eye can continually be trained and, long ago, I have vowed to follow it, therefore work “from the bottom up.” Like the Antaeus of Greek myth, I gather strength and persist by often touching the earth.
The second interesting approach, which he attempted while coming to the “visual settlements” was to develop word vectors of each of several sites and then look for the most common terms that were relatively unique (or rare) to a site among the collection (using TF/IDF). The phrases that define my blog as different from others in the collection?
agent, assignment, association, authority, campus, candidate, citizen, communication technology, grad student, grade, graduate student, guest, journalism, nation, peer, period, porn, slashdot, terrorist, undergrad, undergraduate, venue, web page, wikipedia.
Actually, a glance through those terms really does a good job of explaining what makes up my blog.
At any rate, an interesting set of probes. Will be interesting to see where he goes with it.