Went yesterday to a symposium: Scholarly Publishing and Archiving on the Web. This was the fifth symposium hosted by the University at Albany, and it was a small gathering, but a focused group. It was a little funny to be one of the non-librarians in the crowd, but once again it seems as though the library folks are especially willing and able to meet the challenges of new media head on. I sketched out some of the following notes during the day.
Maximizing Research Impact Through Institutional Self-Archiving
The conference began with a keynote from Stevan Harnad, who is a professor at the University of Southampton, and a force behind the creation of the eprints software. His argument is not that librarians and information scientists need to turn to electronic publishing–they are already converted. The question is what structures are needed to allow researchers to publish online. This issue meshes well with what we have been talking about with the Microcontent Research Center. What do we need to do to encourage the kind of collaborative work that scholars, at least in theory, are already interested in doing?
He began by trying to recall our higher selves, asking why we decided not to sell junk bonds. His focus was firmly on refereed journal articles. He holds aside books written for profit, and sponsored research, as separate and somehow tainted. We are, he argues, “monks, but not saints”–we can be expected to turn a profit on popular books, while “giving away” some of our most valuable ideas. This in mind, his focus is on our better selves, the part of our research that contributes to the store of knowledge.
University administrators have the same goal as researchers: research impact. As such, anything that inhibits the process of giving away research reports is a thorn in the side of both researchers and administrators. Paying for access is just such an obstacle. Open access must be made available: “those days, where tolls are paid for refereed articles, are over.”
Peer review remains essential. It is not something that is tied to the old regime. We do not want, necessarily to move away from peer review. There are a large number of hypotheses for ways to improve upon refereeing, but these remain untested (e.g., the “guild system” Kling will talk about later).
The process as it stands is now like this. You do research and write a pre-print, which is pre-refereed by your colleagues. All research appears somewhere, its just a matter of where. The pre-print is then reviewed by peer experts before being published on paper. This process is imperfect, with a lot of wasted time, but it works.
The average article costs (“to the world”) $2000 on for access. This was a necessary evil during the “Gutenburg era” but no longer. Peer review, however, remains vital. This process is dynamic. It isn’t just accept or reject, reviewers also increase the quality of submissions. We don’t have to throw out the baby with the bathwater.
Max cost per paper for peer review is $500. The money works: you can see the economics are in favor of electronic distribution. But we don’t have the time to wait for this to work itself out. We need to self-archive our refereed papers.
The existing, traditional process need not disappear. The only added element: self-archive the pre-print (optional), and self-archive the post-print (mandatory). Articles available online are cited 336% more often. Self-archiving leads to higher values in other measures of impact as well. The arXiv is good, but could be better if we decentralized process. Need to move beyond growth to take-off. All universities should mandate self-archiving and an online CVs. Assessment could work by automatically crawling everyone’s CVs, and assembling the data. The software is all available open source (!).
I really like this idea of decentralized publishing. It appeals greatly to my more anarchic side. It also seems to harness what the internet is really good at. And there are already protocols and tools available to do this stuff.
We need to do this in the School of Informatics, or at least at the departmental level. But for this to happen it needs to be mandates “top down.” Folks will resist, complaining (with reason) about time resources. Had a chance to talk to June and Miguel a little about this. We could set up a grad student (Alex or Raymond) to do some of the heavy lifting in terms of setting up the CVs or early prints. Miguel is going to play with eprints or Dspace on his server to play with a bit.
In the discussion, someone asked whether this isn’t like the Napster case all over again. In the case of Napster, the authors were wanting to be paid and the technology inhibited this, while in the case of academics, the author wants to give away report, but is blocked by copyright and publishing contracts. It is the inverse of Napster.
Panel: What systems are out there?
Simeon Warner talked about the Open Archives Initiative. Think of a number of repositories: the OAI is the glue. The initial aim was to allow interoperability between open pre-print archive. Now, it focuses on standards and “the technology part.” There is a difference between the open archive and open access, the two are not identical, but they are complementary.
The protocol for metadata harvesting grabs XML documents. These must contain some basic level of Dublin Core metadata. Repositories open themselves up to collection. This data is now searchable. Other stuff could be done as well: subject aggregation, etc.
Three ways are available to make the (meta)data available, a centralized record, one or more servers for contributions, or something more peer-based. What if you are a tiny, and only have a few papers? There is a static, non-push approach that requires only exposing a metadata file.
This process should make librarians happy, because it gets them involved in creating, as well as distributing, information and knowledge. OAI provides a basis for services to exchange metadata.
Nancy Harm demonstrates Luna Imaging. It was interesting, certainly, but the proprietary nature of the software, and the fact that it was pretty much a demo, combined with a warm room and a 4am morning to make me a bit sleepy. Very slick system though with some nice, fairly intuitive ways of manipulating images in an archive.
Catherine Candee, who is running the eScholariship project of the California Digital Library asks: Are we undermining the gift culture of the Academy?
The CDL was created in order to engage the effort is to make one library for the entire university. The first part of this was to make good use of economies of scale across the UC campuses.
In this process, it struck them that it was a little dumb to be spending $23 million in subscriptions to buy back what their paid faculty already gave away. Of the top 2000 journals, 12% of senior editors and 10% of authors are UC people. So, there were lots of people wanting to share info, and lots of disparate projects. Quickly they recognized a need for interoperability, and support for the interests of the existing faculty.
The eScholarship repository includes tools for everything from simply uploading a paper to full peer review. This is also decentralized, but down to the units, rather than putting the onus on individual faculty. Now the infrastructure is becoming more known, so the faculty is increasingly involved.
Also worked on an XML book delivery project. About 750 titles, most available only to UC (350 available locally). Right now, this is increasing hardcopy sales, as well, but “ask me in two years.”
In addition, there are several edited volumes being put together online. When an edited volume is created, the items that have been peer reviewed show up before the whole volume is done, so that a paper that has been deemed ready need not wait for the straglers. Once all are complete, the editor can write an introduction, and “publish” the book to the Web. UC Press looks at the finished volumes, and decides whether they want to print each one on paper as well.
Rob Kling: Disciplining the Internet to Promote Scholarly Communication via Publication
Electronic distribution is not just an advantage to the individual writer, but to readers as well. John Unsworth stresses the need to learn from failures as well as success, and this applies to epublishing.
Sociotechnical failure modes exist and are difficult: people may not come… etc. There seems to be a lack of a systematic investigation. There is a similarity here between complex archiving projects and bridge building. Henry Petroski finds that bridge-builders who did not look obsessively at failures in order to build good bridges tended to make the same mistakes over again. “Failure aversion” rather than just “success extension” is the way to go for building robust systems.
Why is arXive not a model that fits all? Is there an alternative that works for disciplines other than physics and such who are comfortable with multiple publication? Some researchers (e.g., protein chemists) may share data but not writing, and other novel combinations occur as well.
eBiomed should have worked as well as arXiv. When a call for comment went out, several bio association heads (Federation of Amerixan Societies for Experimental Biology) came out strongly against such an idea. They feared that membership would jump ship if their private journals were not a draw. PNAS had a different objection, saying that each paper on the site would have to be refereed so as not to draw the general quality downward.
PubMedCentral, the ultimate result, is run by associations and publishers. There were not enough problems to drive folks to the streets. There was not a cultural history of sharing in the biosciences. And many valued journals as an attention filter. In other words, there were significant social and cultural barriers that did not exist in the physics case.
Kling is looking for a middle ground between peer review and no review at all. This is his “guild model”: pulling together existing working papers, and the like, and making them more widely exchangeable.
The idea is to move toward the model of Harvard’s business school working papers: career-review, not peer review–granularity is at the person or program, not the individual papers.
DOE links together a large number of linked archives into a pre-print network. By drawing on these “guild centers” in each discipline, there can be an efficient exchange of pre-prints. Why are Info Sciences “Cheering on without joining in”? Need to fix that.
I asked Kling at the break about the potential problems with the “guild model” in terms of reputation reinforcement. He glazed over this as a potential problem several times, but I think it may be more than a bug. This is potentially a very bad thing, reinforcing the already too-inbred and introspective disciplines. It also favors the star system, rather than merit of thought or analysis. This would be a giant step backward, in my humble opinion. It looks as though he is aiming for the default system, what might end up happening if other approaches are not taken. This is a great warning against the danger of veleity, but he certainly could not sell me on the advisability of such a system.
Institutional Venue for Electronic Publication and Distribution of Scholarly Content
Maria Bonn spoke a bit about the roll of the “Scholarly Publishing Office” at the University of Michigan. In particular, their efforts to reuse patterns and software was admirable.
Susan Gibbons spoke about the effort to do a large scale trial run of DSpace at the University at Rochester. The provost there drove the effort, and there were a wide range of interests for archiving: dissertations, working papers, performance recordings from the 40s, conference proceedings. Wanted something flexible enough to handle this all.
Commercial systems were great for librarian built archives, including (expensive e-pubs bit), while open source systems were both free and extensible. Started with eprints, but ended up with DSpace because it could do a lot more in terms of providing services. It was highly flexible, format neutral, and could create “communities.”
They are part of the DSpace federation of universities doing a pretty major public beta. They should be up and running May 19.
Finally, Timothy Stephen spoke a bit about the (long) history of CIOS. I was familiar with much of this, but it was interesting to think about how early CIOS was, and how little it is recognized. It seems, in some ways, to be emblematic of the discipline of communication. Talked to him a bit after: hadn’t realized how involved Tom Jacobson was with the CIOS project.
So, there are some brief notes. Overall, I found this to be well worth the day spent. Perhaps because I am somewhat disconnected from the libraries, much of this was a pleasant surprise for me. I know there was some talk of automating CVs floating around in blog-space recently. It seems if we are to do this, we should take away the lessons of eprints (and look for interoperability), and should look at ways microcontent can help to glue together an open publishing scheme.