Big time word freq!

Anyone who knows what I’ve been working on lately, knows that it has to do with changes in word frequency over time. I’m using this to analyze differences in newspaper coverage, to identify salient changes in hate speech sites, as well as to look at the “social weather” of blogs. It looks as though I am not alone:

Jon Kleinberg, at Cornell University in New York, has developed computer algorithms that identify bursts of word use in documents.

While other popular search techniques simply count the number of words or phrases in documents, Kleinberg’s approach also takes into account the rate at which the word usage increases. (New Scientist)

Kara pointed this out to me on Slashdot, and my first reaction was a bit gut-wrenching. It is always awful to think someone has beat you out. Some of my ideas, of course, appear in papers presented at AEJMC and at the AIR conference last year, but I’ve been too slow to get them out the door. I guess I’d better before it is too late. And some of this, as this short blurb suggests, is evident from other approaches. I came to this as a way of categorizing text that seemed to work well.

This leads to some interesting questions about self-disclosure on blogs. (Let me be clear at the outset: I have no pretensions that anyone got this idea from me! ) I have talked a little about this, and put up a python script for people to play with.

But I have kept a significant chunk of my work to myself. Part of this is that I know if I sketched it out, those with more time and programming skills would easily put it into practice (i.e., lazy web). So this is a very selfish thing to do. My livelihood is at stake, if others make use of my ideas before I do, I’m literally out of a job. So, I am forced to walk a tightrope: I want to be “radically open,” but at the same time have to recognize that timing is everything.

Anyway, that original gut-wrenching feeling — which wasn’t helped by a senior colleague noting that I was likely to go uncited in the literature if I published research in a similar vein — has given way to the security of knowing that someone far more respected than I am thinks the idea has merit. It’s better to be part of a small community doing similar work than to try to be a community of 1.

This entry was posted in Uncategorized and tagged . Bookmark the permalink. Trackbacks are closed, but you can post a comment.


  1. alex
    Posted 2/19/2003 at 8:22 pm | Permalink

    Here is a link to the paper. Turns out he’s doing something far more fancy. More power to him–though the result seems to be the same as my very simple approach. Maybe I’m missing something–wouldn’t be the first time.

  2. Posted 2/19/2003 at 8:42 pm | Permalink

    It seems to me that there’s a lot of protection involved in publishing nascent ideas in your blog. Once they’re published, they’re copyrighted (and if it’s an algorithm, it’s “prior art”), whereas if you keep them to yourself and get “scooped,” you have no recourse.

    What basis does your “senior colleague” have for saying you’d go unpublished? That sounds more like sour grapes than good counsel. Name a field of study in which there aren’t multiple people publishing varied approaches to analysis and understanding. If your research is sound, and your writing is good, there’s no reason to think you’d go uncited. (Especially if you take advantage of the connectivity to others in the field that weblogs and their ilk provide.)

  3. Posted 2/20/2003 at 8:45 am | Permalink

    I think you are right from an IP perspective, but I’m not sure there is the same reputational value. That is, if I have a blog entry that says “gee, it would be cool if you could collect the searches in a major search engine and display them as a list of gainers and losers,” and then Google does this, who gets the glory. Otherwise the halfbakery would be the publication venue of choice.

    As to the second point, the issue was not so much whether I would get published in the area: my work is in a similar direction but stands on its own. The issue is that folks are more likely to cite an established person in the field. Especially now that his grad student, Duncan Watts, is making the academic star circuit, Kleinberg is a recognized “name” and more likely to get cited. I don’t think the faculty member I mentioned was belittling my work–not at all–he was just observing what seems to be the case: meritocracy is the rule in academia, with a heck of a lot of exceptions :). If I had two, equally good articles to use as examples of a method I was using, and had to choose one (for some bizarre reason), I would go with the person I think most of my readers would know.

One Trackback

  1. By mamamusings on 2/21/2003 at 7:46 pm

    synchronicity and collaboration
    A few months ago, I wrote about the way blogs allow us to see ideas emerge simultaneously in more than

Post a Comment

Your email is never published nor shared. Required fields are marked *


You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>