[the making of, pt. 1] Of sausages and research papers

I’ve been meaning to do this for a while. When I get questions from students and colleagues, it is rarely relating to things I’ve found, but how I found them. How did you get that data and make sense of it? I do a poor job of explaining this, and many are unhappy when there isn’t a simple software tool that can accomplish what they want.

One of the advantages of blogging is the potential to live other people’s lives, and, as a result, learn a bit more about what they do. For an academic, blogs can (and often are) used to work through ideas, but more rarely is the process of doing research and writing threshed out. I’ve decided to do that for a paper I am writing, providing you with a blow-by-blow feel for how I do research, and how I write.

What I describe in the following posts isn’t the only way to do things, and it’s not even the normal way I do things, if such a thing exists. The process will change depending on the object of study, the type of work, and the team (or individual, as in this case) working on it. But hopefully it will provide you with some idea of how I go from nothing to a (hopefully interesting) research article.

There are dangers here. The first and foremost is looking like a complete idiot. But I’m kind of used to that, so I’m not going to worry too much about it. Hopefully, I’ll learn something out of the process. Another is that it messes with blinded peer review, but I’ve found that is already pretty messed up. If you have questions or comments, I’m very happy to hear them!


The first step in any piece of research is thinking. I know that sounds obvious, and I only wish that it were. The idea is to come up with a piece of “specified ignorance” as Merton put it: something we should know that we do not know. (Note that this is different from “I can use method X and theory Y, now I’ll apply it to yet another set of easily found data.) It also should be something that you find to be exciting. Others will tell you that there are other requirements: that it be part of a sustained research agenda, that it be fundable, that it be of current interest to the profession, etc. I won’t disagree with any of that, but I don’t personally care very much about those things. I’m generally taken by fairly disparate kinds of issues and just let my curiosity get the best of me. Unsurprisingly (at least to me) these end up clustering together into some form of research agenda on their own.

Usually, I am inspired by “that’s cool” moments. A year or more ago, I found a lot of my time taken up by the Digg site. As it happens, I think that it and similar social filtering sites are really important to how the web works today, and we need to understand them better in general. I guess you could say they are becoming a stream in my research. But at the time, I just thought a tool that let you view your posting history, from Neaveru was pretty cool, and suggested some patterns in people’s behavior. Despite a self-image that claims not to care what other people think, I got pleasure from being widely “dugg,” and was frustrated when my comments were dugg down.

This got me thinking about the new explicit kinds of ratings of people that seem so common on the social web, from Technorati to Compare People to numbers of Twitter followers. All of this starts to feel a bit like distributed whuffie.

Of course, I’ve been thinking about related stuff for a while. And this new interest echos back strongly to a bit of a side-track I engaged in for one of the chapters of my dissertation (pdf). I never did the “make your dissertation a book” thing–I was sick of it at the time and now I wish I had done more with it after graduating. Anyway, I can at least fall back on some of the ideas I engaged there, including an analysis of how experience on Slashdot led people to learn to get more votes up from moderators.

Provisional Research Questions

So, I know I generally want to look at the function of the ratings of comments on Digg, and how they might be related to posting behavior. Although I’ve never been a fan of applying the strict hypothesis model to the social sciences, I do think you need to have some clear ideas of how you want to measure and operationalize your ideas. First, I need to narrow things a bit. Note that I expect that some of these ideas may not make it to the finished product (i.e., they may end up being stupid questions), and I may run across other things in the process that make me revisit my initial ideas.

I suspect, based on the work I did with Slashdot, that people learn to get better “grades” the longer they post on Digg. I’m not sure what “longer” really means, whether it is number of posts or amount of time, but it’s possible to check both. In the case of Slashdot, there appeared to be a “learning period” during which there was an increase in average post ranking. After this learning period, some people continued to post popular comments, while others abandoned popularity for obscurity, or actively posted trolling comments that would get ranked down.

The response of fellow Diggers to a comment also calls to mind something approaching a Skinner box, with small treat rewards for conforming behavior. I wonder if there is a direct measurable effect of positive or negative reinforcement of posting behaviors.

And, in a “mushier” sense, I’m wondering if there are traits that make it more or less likely for a comment on Digg to be voted up or down. I suspect that if I tried posting in French or Japanese, or just posted total nonsense unrelated to the discussion, this would get me voted down. So, being on-topic and writing in a language and style that can be comprehended seem important to a good Digg comment, but can we find certain kinds of things that tend to improve your chances. On Slashdot, humorous remarks were key to high rankings, for example.

So, I’m interested in three things (in a slightly different order than they are addressed above): Is a highly ranked post more likely to encourage a new post sooner? (And perhaps: is dropping out of posting related to consistently being Dugg down?) Do people tend to learn how to post more successfully over time? What content seems to result in the highest ranking?

From here

The truth is that my research and writing process is often pretty muddled, and I cannot promise that this will be any less the case this time. That said, in the following parts, I’ll address the process of assembling a brief literature review, writing up the method, collecting the data, analyzing the data, writing up the research, and going through the conference presentation and publication process.

Next: Part 2: Assembling a literature review

This entry was posted in Uncategorized and tagged , , . Bookmark the permalink. Post a comment or leave a trackback: Trackback URL.


  1. Posted 10/1/2008 at 3:38 am | Permalink

    I came across your blog while trying to sort out the untagged feeds in my too long Google Reader list; I must have somehow pre-selected it some time ago, but I can’t remember, anyway — what a great surprise!

    Both the tone of your blog, your approach to research and your topic are spot on for me; I’ve been stuck in he hell on procrastinating on Digg, and you are pulling me out.

    One question on your approach (I’ll read your next posts in a minute): a higher rank is a reward, and there are truckloads on study on the effect of rewards (that’ll be your next post, I’m sure) — but while the consensus is the effect is globally positive, most agree it’s not a uniform effect: there is such a thing as resting on your laurel, that is merged with the encouragement. Timing is of the essence, and your question seems to blur that — although by measuring time between posts, you should be able to sort the dynamics of reporting sessions.

    Congrat’s on the Captcha.

  2. Posted 10/2/2008 at 5:28 am | Permalink

    I was right to believe that influence isn’t that monotonous, and I was wrong to assume you wouldn’t detail that with the data.

    I don’t know if your research idea was influenced by this paper, but you might want to put alerts on Huberman’s work:

    Crowdsourcing, Attention and Productivity.
    Bernardo A Huberman, Daniel M Romero and Fang Wu

    Abstract: The tragedy of the digital commons does not prevent the copious voluntary production of content that one witnesses in the web. We show through an analysis of a massive data set from YouTube that the productivity exhibited in crowdsourcing exhibits a strong positive dependence on attention, measured by the number of downloads. Conversely, a lack of attention leads to a decrease in the number of videos uploaded and the consequent drop in productivity, which in many cases asymptotes to no uploads whatsoever. Moreover, uploaders compare themselves to others when having low productivity and to themselves when exceeding a threshold.

  3. alex
    Posted 10/2/2008 at 8:41 am | Permalink

    Managed to completely miss that in my search of the literature. (And in general, which is scary itself since I’m pretty interested in this stuff.) Thank you!

One Trackback

  1. […] is the sixth in a series of posts about the piece of research I am doing on Digg. You can read it from the beginning if you are interested. In the last section I showed a correlation between how much of a response […]

Post a Comment

Your email is never published nor shared. Required fields are marked *


You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>