Much talk lately about how the traditional media is converging in some way. If, for example, two newspapers were creating stories with substantially similar content, we would expect them to have increasingly similar distributions of word frequencies. That is, they probably use the word “chicken” about the same number of times for every 10,000 words. Luckily, this is something that we can check.
A good target seems to be political coverage in large newspapers, right? If that’s getting more homogenous, we’re probably in trouble. So, I grabbed articles from May to October of 1992, 1996, and 2000 that mentioned the names of both the Democratic and Republican nominees, in 8 newspapers: the Atlanta Journal-Constitution, the Houston Chronicle, the Chicago Sun-Times, the San Francisco Chronicle, the Boston Globe, the New York Times, the Washington Post, and the USA Today. In total, 27,127 articles. Then I calculated the cosine similarity index (and reversed it to make it a dissimilarity matrix) among newspapers each year. When folks do this they normally look for “unusual” terms by weighting terms that show up in one document, but not another. I didn’t do this, because I was looking for raw differences (or maybe “distances”). I excluded articles that had any attribution to AP or the other wire services, and handled opinion or editorial pieces separately.
The convergence of news stories among the “national” newspapers was small but striking. The more local newspapers seemed to remain relatively dissimilar from one another during the same period. To get a feel for homogeneity, here is a graph of the distance of each paper from the mean of all newspapers for the year:
Wonder why the Boston Globe tracks so closely to the New York Times? Not hard to figure out. And if you got the feeling that the USA Today came out of left field to become a “real” newspaper, you’re right. If I included it’s 1992 number, everything else would have gotten squished down to almost nothing.
Coming soon (maybe): changes in editorials and the phrases that cause the greatest differences between a couple of the papers.