Anyone who knows what I’ve been working on lately, knows that it has to do with changes in word frequency over time. I’m using this to analyze differences in newspaper coverage, to identify salient changes in hate speech sites, as well as to look at the “social weather” of blogs. It looks as though I am not alone:
Jon Kleinberg, at Cornell University in New York, has developed computer algorithms that identify bursts of word use in documents.
While other popular search techniques simply count the number of words or phrases in documents, Kleinberg’s approach also takes into account the rate at which the word usage increases. (New Scientist)
Kara pointed this out to me on Slashdot, and my first reaction was a bit gut-wrenching. It is always awful to think someone has beat you out. Some of my ideas, of course, appear in papers presented at AEJMC and at the AIR conference last year, but I’ve been too slow to get them out the door. I guess I’d better before it is too late. And some of this, as this short blurb suggests, is evident from other approaches. I came to this as a way of categorizing text that seemed to work well.
This leads to some interesting questions about self-disclosure on blogs. (Let me be clear at the outset: I have no pretensions that anyone got this idea from me! ) I have talked a little about this, and put up a python script for people to play with.
But I have kept a significant chunk of my work to myself. Part of this is that I know if I sketched it out, those with more time and programming skills would easily put it into practice (i.e., lazy web). So this is a very selfish thing to do. My livelihood is at stake, if others make use of my ideas before I do, I’m literally out of a job. So, I am forced to walk a tightrope: I want to be “radically open,” but at the same time have to recognize that timing is everything.
Anyway, that original gut-wrenching feeling — which wasn’t helped by a senior colleague noting that I was likely to go uncited in the literature if I published research in a similar vein — has given way to the security of knowing that someone far more respected than I am thinks the idea has merit. It’s better to be part of a small community doing similar work than to try to be a community of 1.