archiving – A Thaumaturgical Compendium https://alex.halavais.net Things that interest me. Sat, 22 Jun 2013 22:58:13 +0000 en-US hourly 1 12644277 #g20 Tweets https://alex.halavais.net/g20-tweets/ https://alex.halavais.net/g20-tweets/#respond Tue, 06 Oct 2009 17:18:36 +0000 http://alex.halavais.net/?p=2475 I‘ve created this blog entry mainly as a way of providing access to some files related to the work Maria Garrido & I have been doing on the twitter conversation surrounding the G-20 meeting in Pittsburgh in September of 2009.

Briefly, our aim was to examine Tweets that included the #g20, and figure out how leadership structure may have emerged. However, the data may be useful to others as well.

If you make use of the materials you’ve found here, please cite this web page, including both Alex Halavais and Maria Garrido as authors.

Tweets and RT net

The core data is simply a collection of tweets that were collected using The Archivist based on a search for the hashtag #g20 from midnight, September 20 to midnight September 29, 2009. The Archivist should be able to store this in both XML and CSV formats, but for some reason the CSV seemed not to work every time. The files were also collected in various overlapping chunks, in order to provide redundancy, and so then needed to be merged without duplicates. This zip file includes that data:

g20.zip [1.8 Mb]

Note that the tweets in g20tweets.csv is not sorted in any way. It also lacks a header line. The items are:

User, date/time, tweet ID, user image, status

The python script (warning: IANAP) used to munge this together is included, along with a .net file of the re-tweet network that can then be loaded and massaged in Pajek, and the script used to maket that file.

Linked

Anything from the g20-tweets.csv that started with http:// was stored (using get-http.py), and wget used to archive a copy of it. In total, 6,653 different URLs were extracted. Note that wget does not retreive flash video, and so it’s likely this was lost. (There were claims in the Tweets that YouTube was removing videos of police actions.) Other sites may have been unreachable. Finally, URL shorteners no doubt meant that the same site was posted under a range of different URLs.

g20tweetweb.zip [108 Mb]

Note, I haven’t really even looked at that archived material yet, so, it’s As-Is.

Update 7/12/2012: Please note that I’ve removed the above links in case they clash with new Twitter terms of use around sharing twitter data sets. These were collected before any such changes occurred, but being responsive also to privacy concerns. If you are a researcher interested in the data, please just contact me.

Update 4/12/2013: Congratulations to Jennifer Earl and crew on an interesting piece of research that incorporates these data: This protest will be tweeted: Twitter and protest policing during the Pittsburgh G20.

]]>
https://alex.halavais.net/g20-tweets/feed/ 0 2475
Archiving the White House https://alex.halavais.net/archiving-the-white-house/ https://alex.halavais.net/archiving-the-white-house/#comments Thu, 17 Sep 2009 05:18:54 +0000 http://alex.halavais.net/?p=2471 As Mashable reports, the White House is looking for someone to archive their social network materials. I considered actually developing a proposal, but the proposal is due in two weeks, and then they want “off the shelf” implementation in 30 days. I suspect this means they have a product in mind, and I’d be interested to know what it is.

Far more interesting to me, however, is whether they will make these archives immediately available to the public, as well as to the National Archives, etc. If not, they should. There’s no good reason not to. But that doesn’t mean they will.

More to the point, if not, we should. So, assuming I don’t answer the RFQ from the White House–and given my schedule right now, that’s a good assumption–does someone want to help cobble together an archiving system for the White House site, Twitter feed, Facebook page, etc.? They are looking for comments as well, but I would want to go further, and look for re-tweets, in-links, and the like. It’s a non-trivial effort, but the system could have good research and commercial uses beyond the White House.

]]>
https://alex.halavais.net/archiving-the-white-house/feed/ 3 2471
Shifted Pace https://alex.halavais.net/shifted-pace/ https://alex.halavais.net/shifted-pace/#respond Wed, 24 Jun 2009 03:23:50 +0000 http://alex.halavais.net/?p=2368 Got an IM from someone checking in a few weeks back. He had gathered that my work had “changed pace.” I wondered what that meant, and he suggested that I had slowed down.

Now, I am naturally lazy–a trait I am trying to more actively cultivate, but I gather he had figured that because I haven’t been blogging or tweeting or doing any of those other sorts of continual status updates I must be slacking. As usual, my blogging (including micro-blogging) is inversely proportionate to how busy I am, not the other way around. There is a small caveat: sometimes it is an indicator that I am procrastinating, and therefore should be busy. On very rare occasions, when the stars align, it is actually linked to progress on a project, but generally speaking, silence on this front should never be taken as indication that I am actually relaxing a little.

On the other hand, the number of hours I have each week to work on projects is somewhat limited by being the daytime parent (with some help) of Jasper. This remains my priority, and though it sometimes means sacrificing things I would like to do, there is never going to be another time to hang out with my six-month-old, so he wins. As it is, I wish I could spend even more time with him.

In what seems to be a perennial sort of post, here are some of the projects I’m working on right now, besides raising the future benevolent dictator of our solar system:

  • Writing Course at Quinnipiac University. I’ve been dragged–somewhat against my will :)–into teaching the “writing for interactive” course this summer. Actually, the content of the course isn’t what puts me off: it’s that (a) it is in the summer, and I would like to reserve summers for research and projects and (b) it’s 5 weeks long. It is hard enough to teach a course in 15 and have students not feel overwhelmed. When you compress that into 5 weeks–and it’s the same number of credits, so I think we should hit the material at the same depth–it is just impossible. So, dealing with that tension, particularly in a writing course, is going to be difficult. I also need to revise my fall seminars. I’m organizing one of my courses around reading and annotating Little Brother, as well as heavily revising my intro (ICM 501) course. (I have also felt a recent disruption in the force in the ICM program, which will probably require even more cycles being put toward re-keeling it.)
  • Digital Media & Learning Hub. I haven’t been talking publicly in any organized way about this, but some of you know that I have been working with the DML Hub, a group constituted to improve collaboration among researchers funded by the MacArthur’s Digital Media and Learning initiative. I’m working with a team to create a DML Collaboratory site for researchers, as well as an external site that will seek to gather the current state of the art in one place. I’m also in the early stages of working with a group to establish some norms of sharing data, particularly qualitative data. I’ll actually be blogging a bit about this latter project in the coming week, and probably tweeting a little about the Collaboratory and that process.
  • Twittering and Protesting. Happy to have the opportunity to work with Maria Garrido again, this time on a project that tracks the ways in which Twitter is being used to both build identity and coordinate action. This is one of two papers that I’ve promised for the AoIR meeting next year. Will be blogging a bit as it develops. This is also one of two Twitter-related research pieces I’m working on, both at early stages.
  • Association of Internet Researchers. In the short term, setting up a registration site, but I am desperately hoping that I can get the Exec behind using this in the long term as well. It would make my life so much easier, and everyone else’s as well! Still doesn’t solve the paper submission and refereeing system issues, but I really hope we are able to move to a different system for that next year. Looking forward to talking to next year’s organizers about how to make that work out a bit better.

A lot of other things are right on the cusp of needing to be done, but I’m trying to keep my head clear of them for the moment. It really doesn’t seem that bad when it’s spelled out as above. Of course, tthere are the other pending things: three book projects, whipping some old research together into publishable form, a grant proposal sometime later this year, various talks, digitizing my library, etc. But I’m trying to keep those things out fo mind, wherever possible.

]]>
https://alex.halavais.net/shifted-pace/feed/ 0 2368