I was lucky enough to be able to have dinner last night with Han Woo Park and his lovely wife, which is even better than having dinner with Han solo (… yeah, sorry, Han). He is in New York for the NetSci conference. He has a longstanding interest in hyperlink networks, and that continues to be something I am interested in as well. In fact, one of the most common sorts of questions I get over email is regarding some of the mechanical aspects of doing this kind of work. That is, the question is rarely about the theory or ideas behind examining hyperlink networks, since these are generally available in published work. Instead, it is about what sort of software to use, how to define things, where to get started, and how to do the analysis.
So, I have decided, despite too many projects already scheduled for this summer, to do a short mini-investigation of a particular network, and blog my progress with an eye to details that might be useful for others doing this kind of work. This isn’t a cooking show: nothing has been baked ahead of time. It may–as too much of my research does–end in total disaster. I expect wrong turns.
I also expect it to take a while. This is something I will do in the spare 10 minutes I have, as a break between other things. So, progress may come a bit slowly. In part, I’m hoping this will be something I can point people to when they ask about some of these things.
I want it to be a real project, rather than just some demo stuff. I’ll upload any data I have as I move along.
What kind of hyperlink network do I want to investigate? I have a few projects waiting in the wings. One has to do with blogging behaviors, and though I am eager to start that project, some of the details there are probably not as generalizable. Instead, I want to look at more fixed, traditional websites. I am going to draw on some initial brainstorming I had with Maria Garrido exactly four years ago over wet burritos by the San Diego bay. (Damn, and I can’t even remember my students’ faces when I meet them outside of the classroom.)
We had finally capped off our work on mapping grassroots networks that were associated with the Zapitista movement. If you are interested in that work, you can find it as a chapter in the Cyberactivism book, or you can read an earlier version of that paper (pdf), presented at the Association of Internet Studies conference. We decided that the natural next step would be to compare this emerging internetworked set of sites for grassroots groups to the network of sites for the more traditional governmental and non-governmental organizations.
My working hypothesis is that these more established sites will feel more like the sites of traditional automobile companies in that they will attempt to remain “sticky” and not link outside. Since I will be collecting data from these sites, I’ll try to look not just at the hyperlinks, but at some other factors that might help to explain why the network is the way it is.
My rough plan of attack, though apt to change, is thus:
1. Do a very preliminary literature review. I’m actually going to talk about the mechanics of this as well, since it will give me a chance to point my students in this direction. I have to say that this will be a “light” literature review. Mainly, I want to (a) make sure I’m not doing something that has already been done and (b) see if there are any interesting observations that I might be able to fit into my own work.
2. Decide on what to collect (archive).
3. Decide on how to collect it, and collect it.
4. Do some manipulations on my local data that will allow me to analyze the network (assuming there is one!).
5. Do some loose exploratory analysis of the network.
6. Look for possible explanations.
7. Write it up and present it, if the outcome is worthy.
There are a couple of things to note here. I far prefer what some call “exploratory” work to the more traditional hypothesis testing that is the norm in the social sciences. I generally am curious about something and pursue it. That helps me to remain interested in the project. There is the downside, which is I sometimes end up with garbage. Well-designed projects yield interesting results no matter what the data, but this one may not.