Three QU students arrested. Chronicle?

Three undergrads have been arrested with a drug stash in their dorm. Given the trouble our campus has had with drinking, you think they might actually encourage something a bit less corrosive. (I’m only half kidding–security turned a fairly blind eye to marijuana use by students at some of the west coast universities I know.)

So, the independent paper that the administration has declared part of the Axis of Evil has a story on the bust. Still waiting on the administration-backed paper, The Chronicle. I hope the delay in publishing is simply because they are using their direct access to the campus to do some hard-hitting investigative reporting.

Posted in Uncategorized | Tagged | Leave a comment

[the making of, pt. 4] Basic descriptions of the sample

This is the fourth in a series of posts about a paper I am writing, breaking down the process old-school. It started here. So, in part 3, I talked about how I got the sample of the users (and waived my hands a bit about the sample of the comments). Now, I want to tell my audience (and know myself) the basic structure of the sample.

Counting it up

I can say, for example, how many I collected (30,000), and what the oldest of these accounts is (December of 2004) and what the newest accounts are (cut off at May of 2008). I also want to say something about how many of these post comments, and how many comments I have.

The latter is pretty easy: I dump my comma-delimited file of comments into a plain text editor. I use Notepad2 for this sort of thing, because it has line numbering (making my job easier), and doesn’t–like the original Notepad–crash a Windows system when you try to open very large files. In total, 197,658 comments.

Distribution

So, on average, that’s a lot of comments. But we know that it’s unlikely many people post the “average” number of comments, or even that it is distributed normally around that average. Far more likely is that you have a large number of people who post never or infrequently, and a handful of freaks enthusiasts posting every two minutes or so. What we need to do is count up the comments by user.

So we turn to Python again, and write up a quick script that goes through the 197K comments and counts up how many each user makes. In practice, the program doesn’t find all 30,000 users in our sample, because 23,532 have not posted a comment. The result is a comma delimited file with the user name and number of comments. Now we can construct a histogram.

Histogram

I am a big fan of Excel, and we could use it to create the histogram, but I always seem to spend about 15 minutes figuring out how to do histograms in Excel, relearning it each time. The obvious choice is SPSS, but for a change of pace, I’m going to use a free piece of mathematics software called R.

The reason is simple enough. A quick run through a regular histogram shows that this is a heavily “powered” Pareto distribution. When I plot it as a regular histogram, it comes out as two lines along the axes, and I tiny curve at the origin. One person actually made 6,598 comments, and I had to check the site to make sure there hadn’t been an error. Another posted over 4,000 comments.

So, what we need is a log-log histogram. Although I’m sure there is a function that will do this for me neatly (and I have to admit ignorance when it comes to doing this in SPSS, but I suspect it’s just a matter of checking a box), I’m once again going to turn to Python to write a script that comes up with frequencies (i.e., how many people posted once, twice, … ntimes). I could “bin” these frequencies and come up with something lat looks like a regular histogram, but since folks are not as used to seeing log-log bar charts, I decided to do it without the bins. The resulting file is just a number on each line, starting with the number of people with one comment, the next line is the number of users with 2 comments, and so on. I drop this file into Notepad2 to take a look, and (CTRL-C) copy all the data.

I open up R, and first execute this command:

x <- type.convert(readClipboard())

This loads all of the data I just copied into a “vector” called x. If you are unfamiliar with the format of R commands, note that the <- is an assignment symbol: it says put the stuff on the right into the box on the left. The readClipboard function–shockingly–reads whatever is on the Windows clipboard. Type.convert converts strings into integers, since the clipboard just assumes whatever you are copying is a string (or character) rather than a number. Now we have all this stuff in the vector x.

Next, I issue the following command:

plot(x, log="xy", xlab="log(number of comments)", ylab="log(number of users)")

which produces the plot shown to the right. It should be pretty clear what each of the options there does, creating a log-log plot of the vector x, with labels for each axis.

Next: Hypothesis testing!

Now we have some basic descriptions of the data, enough to give the reader a feel for what we are working with. Time to rearrange the data a few more times and take measurements that will help us answer questions about the relationship of feedback scores to posting behavior, in part 5.

Posted in Uncategorized | Tagged , | Leave a comment

Twittering the debates

It looks as though Current will be broadcasting the debates live with a Twitter overlay tonight, assuming there is a debate. Seems like an interesting (if distracting) way to watch.

Posted in Uncategorized | Tagged , | Leave a comment

Does (American-style) Democracy Work?

“If they lie about us, then we will correct the record,” Obama said. “But this election is too important, too serious, to be playing silly games.”

The media has been all about the campaign lies this year. Factcheck.org, a site with an august history, is probably getting more hits this month than it ever has. Heard an interesting interview on Talk of the Nation yesterday about the legacy of Lee Atwater, the success of the Willy Horton ad, and its effect on Republican campaigning. The argument was that Republicans have no problem playing to the base instincts of the populace in order to win an election: all’s fair in love and war. Democrats have found no way to battle against this, and at least traditionally have not engaged in the same kinds of behavior (coded race-baiting, fear tactics, wrapping yourself in the flag, etc.).

I had been tracking on this and was deeply gratified that this election (I thought) would be different. I saw no way that McCain would turn to these tactics, after the way they had been used against him in the past by fellow Republicans. I’m still a little shocked at how wrong I was. Both campaigns have stretched the truth, but that McCain continues to claim–not just in ads but in interviews long after it has been shown by nonpartisan groups to be an outright lie–that Obama is seeking to raise taxes for most people, or that he wanted sex ed for kindergarteners, is way beyond the pale. He knows this, but has subscribed to the belief that it’s OK to lie, as long as you win the election, a bargain that has worked well for the Republicans in the past three decades. It is perhaps most ironic because of his “I’d rather lose the election” talk. Obviously, he’s willing to lie about the issues in order to win.

I’m not unsympathetic. After all, how do you convince the 95% of the voting population that will be helped by Obama’s tax cuts that they should instead pay more so that the top 5% can pay less. The richest 1% in America are now paying way less tax than they did in the 70s, while the middle class is paying way more. The US has a ridiculous (and preposterously trending) Gini coefficient. Really, there are two choices, either lie now about your own policy (“Read my lips”) or tell the truth about your policies, and lie about your opponent’s.

I talked to a colleague who was convinced that the economic situation means that Obama can’t lose. I’m not so sure. The Republicans have a playbook that works, that plays on the fears of the undecided voters, who tend to be–sorry–pretty uninformed and lacking common sense. They also have this odd schizophrenia in their appeal. A candidate who claims you aren’t really rich until you are making $5 million a year, and then suggests that Democrats are elitist, seems to be running on a campaign of “rationality is outdated.” And maybe it is.

But they do manage to collect a very large group of people who tend to be less educated than Obama supporters, and less wealthy. In some ways, I am the typical Obama supporter: a professor with an income well above the national average. (Professors disproportionately support Obama.) Some of the things that educated folks like about Obama–that he is able to articulate clear policy positions rather than playing to fears and “values”–may end up losing him the race. Undecideds don’t care about issues, they want to see a fight, and the bloodier the better.

Democrats are beginning to oblige, much to my disappointment. In recent television ads, Obama is suggesting that if McCain were president, seniors would have lost their social security checks in the current Wall Street meltdown. This goes beyond stretching the truth–it’s just plain wrong. Future retirees may have lost a significant amount of their benefits, but that’s not what the ad says. It preys on the fears of a demographic Obama desperately needs to win. And what I thought was looking like a Swiftboating effort looks to be taking hold, as does a call to release McCain’s medical records. (The latter is a borderline issue. I think most voters can handicap McCain’s likely survival rate, as the oldest president, who has already suffered medical issues, and is exhibiting some dementia already, without detailed medical evidence that wouldn’t add much to the discussion anyway.)

On one hand, the idea that McCain/Palin would come into the White House based on a campaign of lies, fear, and doubt is extraordinarily depressing. If the Bush elections were not a clear enough indication that our campaigning system is broken, a McCain win would be. I am not one of those people who says “I’m moving to Canada.” Or, rather, I am, but that’s just because Canada seems like a nice place to live sometimes. However, if McCain wins, I will be disappointed not just in the outcome, but in the process that has allowed Americans to vote against their own interest, and the interests in the country. If McCain won because people believed in his policies, I would have far less of a problem–compromise is at the heart of a democracy. But to win on the basis of deception dishonors not only his own legacy, but the country he seeks to lead.

Now it seems that some in Obama’s camp are willing to jump down that slope with McCain, hitting back with untrue ads, even as Obama says he wants to stay above the fray. If Obama wins by misleading the public, it will likewise shake my faith in our democracy. There is enough to attack McCain on that is true, there is no need to willfully misconstrue his remarks (as he has done with Obama: pigs and lipstick–no one ever said Americans were not gullible), or make up policies. Just tell Americans the truth: that McCain wants to outlaw abortion and gay marriage, he won’t negotiate with enemies like Iran, he wants working class Americans to pay more tax so that the rich and corporations can pay less, he is uninformed on the economy, doesn’t know who is in charge in Spain, or that there is no Czechoslovakia, or that Iran isn’t training al Qaeda, he is self-made only in that he married into wealth, he, like Bush, is proud of his lack of intellectual pursuit–all of these things are true, and still speak to his inability to lead. Heck, clips of McCain himself saying tremendously stupid things are probably the best attack ad. No need to do voiceovers or text: just show Americans what he has in store for them.

Or better yet, suck it up, correct the lies, and tell the American people what you are going to do to help get us out of this mess. The ideal: that Obama is elected, and manages to do so without giving into the mud-slinging that both candidates said they would avoid.

Posted in General | Tagged , , | 5 Comments