Scholarly web publishing

Went yesterday to a symposium: Scholarly Publishing and Archiving on the Web. This was the fifth symposium hosted by the University at Albany, and it was a small gathering, but a focused group. It was a little funny to be one of the non-librarians in the crowd, but once again it seems as though the library folks are especially willing and able to meet the challenges of new media head on. I sketched out some of the following notes during the day.

Maximizing Research Impact Through Institutional Self-Archiving

The conference began with a keynote from Stevan Harnad, who is a professor at the University of Southampton, and a force behind the creation of the eprints software. His argument is not that librarians and information scientists need to turn to electronic publishing–they are already converted. The question is what structures are needed to allow researchers to publish online. This issue meshes well with what we have been talking about with the Microcontent Research Center. What do we need to do to encourage the kind of collaborative work that scholars, at least in theory, are already interested in doing?

He began by trying to recall our higher selves, asking why we decided not to sell junk bonds. His focus was firmly on refereed journal articles. He holds aside books written for profit, and sponsored research, as separate and somehow tainted. We are, he argues, “monks, but not saints”–we can be expected to turn a profit on popular books, while “giving away” some of our most valuable ideas. This in mind, his focus is on our better selves, the part of our research that contributes to the store of knowledge.

University administrators have the same goal as researchers: research impact. As such, anything that inhibits the process of giving away research reports is a thorn in the side of both researchers and administrators. Paying for access is just such an obstacle. Open access must be made available: “those days, where tolls are paid for refereed articles, are over.”

Peer review remains essential. It is not something that is tied to the old regime. We do not want, necessarily to move away from peer review. There are a large number of hypotheses for ways to improve upon refereeing, but these remain untested (e.g., the “guild system” Kling will talk about later).

The process as it stands is now like this. You do research and write a pre-print, which is pre-refereed by your colleagues. All research appears somewhere, its just a matter of where. The pre-print is then reviewed by peer experts before being published on paper. This process is imperfect, with a lot of wasted time, but it works.

The average article costs (“to the world”) $2000 on for access. This was a necessary evil during the “Gutenburg era” but no longer. Peer review, however, remains vital. This process is dynamic. It isn’t just accept or reject, reviewers also increase the quality of submissions. We don’t have to throw out the baby with the bathwater.

Max cost per paper for peer review is $500. The money works: you can see the economics are in favor of electronic distribution. But we don’t have the time to wait for this to work itself out. We need to self-archive our refereed papers.

The existing, traditional process need not disappear. The only added element: self-archive the pre-print (optional), and self-archive the post-print (mandatory). Articles available online are cited 336% more often. Self-archiving leads to higher values in other measures of impact as well. The arXiv is good, but could be better if we decentralized process. Need to move beyond growth to take-off. All universities should mandate self-archiving and an online CVs. Assessment could work by automatically crawling everyone’s CVs, and assembling the data. The software is all available open source (!).

I really like this idea of decentralized publishing. It appeals greatly to my more anarchic side. It also seems to harness what the internet is really good at. And there are already protocols and tools available to do this stuff.

We need to do this in the School of Informatics, or at least at the departmental level. But for this to happen it needs to be mandates “top down.” Folks will resist, complaining (with reason) about time resources. Had a chance to talk to June and Miguel a little about this. We could set up a grad student (Alex or Raymond) to do some of the heavy lifting in terms of setting up the CVs or early prints. Miguel is going to play with eprints or Dspace on his server to play with a bit.

In the discussion, someone asked whether this isn’t like the Napster case all over again. In the case of Napster, the authors were wanting to be paid and the technology inhibited this, while in the case of academics, the author wants to give away report, but is blocked by copyright and publishing contracts. It is the inverse of Napster.

Panel: What systems are out there?

Simeon Warner talked about the Open Archives Initiative. Think of a number of repositories: the OAI is the glue. The initial aim was to allow interoperability between open pre-print archive. Now, it focuses on standards and “the technology part.” There is a difference between the open archive and open access, the two are not identical, but they are complementary.

The protocol for metadata harvesting grabs XML documents. These must contain some basic level of Dublin Core metadata. Repositories open themselves up to collection. This data is now searchable. Other stuff could be done as well: subject aggregation, etc.

Three ways are available to make the (meta)data available, a centralized record, one or more servers for contributions, or something more peer-based. What if you are a tiny, and only have a few papers? There is a static, non-push approach that requires only exposing a metadata file.

This process should make librarians happy, because it gets them involved in creating, as well as distributing, information and knowledge. OAI provides a basis for services to exchange metadata.

Nancy Harm demonstrates Luna Imaging. It was interesting, certainly, but the proprietary nature of the software, and the fact that it was pretty much a demo, combined with a warm room and a 4am morning to make me a bit sleepy. Very slick system though with some nice, fairly intuitive ways of manipulating images in an archive.

Catherine Candee, who is running the eScholariship project of the California Digital Library asks: Are we undermining the gift culture of the Academy?

The CDL was created in order to engage the effort is to make one library for the entire university. The first part of this was to make good use of economies of scale across the UC campuses.

In this process, it struck them that it was a little dumb to be spending $23 million in subscriptions to buy back what their paid faculty already gave away. Of the top 2000 journals, 12% of senior editors and 10% of authors are UC people. So, there were lots of people wanting to share info, and lots of disparate projects. Quickly they recognized a need for interoperability, and support for the interests of the existing faculty.

The eScholarship repository includes tools for everything from simply uploading a paper to full peer review. This is also decentralized, but down to the units, rather than putting the onus on individual faculty. Now the infrastructure is becoming more known, so the faculty is increasingly involved.

Also worked on an XML book delivery project. About 750 titles, most available only to UC (350 available locally). Right now, this is increasing hardcopy sales, as well, but “ask me in two years.”

In addition, there are several edited volumes being put together online. When an edited volume is created, the items that have been peer reviewed show up before the whole volume is done, so that a paper that has been deemed ready need not wait for the straglers. Once all are complete, the editor can write an introduction, and “publish” the book to the Web. UC Press looks at the finished volumes, and decides whether they want to print each one on paper as well.

Rob Kling: Disciplining the Internet to Promote Scholarly Communication via Publication

Electronic distribution is not just an advantage to the individual writer, but to readers as well. John Unsworth stresses the need to learn from failures as well as success, and this applies to epublishing.

Sociotechnical failure modes exist and are difficult: people may not come… etc. There seems to be a lack of a systematic investigation. There is a similarity here between complex archiving projects and bridge building. Henry Petroski finds that bridge-builders who did not look obsessively at failures in order to build good bridges tended to make the same mistakes over again. “Failure aversion” rather than just “success extension” is the way to go for building robust systems.

Why is arXive not a model that fits all? Is there an alternative that works for disciplines other than physics and such who are comfortable with multiple publication? Some researchers (e.g., protein chemists) may share data but not writing, and other novel combinations occur as well.

eBiomed should have worked as well as arXiv. When a call for comment went out, several bio association heads (Federation of Amerixan Societies for Experimental Biology) came out strongly against such an idea. They feared that membership would jump ship if their private journals were not a draw. PNAS had a different objection, saying that each paper on the site would have to be refereed so as not to draw the general quality downward.

PubMedCentral, the ultimate result, is run by associations and publishers. There were not enough problems to drive folks to the streets. There was not a cultural history of sharing in the biosciences. And many valued journals as an attention filter. In other words, there were significant social and cultural barriers that did not exist in the physics case.

Kling is looking for a middle ground between peer review and no review at all. This is his “guild model”: pulling together existing working papers, and the like, and making them more widely exchangeable.

The idea is to move toward the model of Harvard’s business school working papers: career-review, not peer review–granularity is at the person or program, not the individual papers.

DOE links together a large number of linked archives into a pre-print network. By drawing on these “guild centers” in each discipline, there can be an efficient exchange of pre-prints. Why are Info Sciences “Cheering on without joining in”? Need to fix that.

I asked Kling at the break about the potential problems with the “guild model” in terms of reputation reinforcement. He glazed over this as a potential problem several times, but I think it may be more than a bug. This is potentially a very bad thing, reinforcing the already too-inbred and introspective disciplines. It also favors the star system, rather than merit of thought or analysis. This would be a giant step backward, in my humble opinion. It looks as though he is aiming for the default system, what might end up happening if other approaches are not taken. This is a great warning against the danger of veleity, but he certainly could not sell me on the advisability of such a system.

Institutional Venue for Electronic Publication and Distribution of Scholarly Content

Maria Bonn spoke a bit about the roll of the “Scholarly Publishing Office” at the University of Michigan. In particular, their efforts to reuse patterns and software was admirable.

Susan Gibbons spoke about the effort to do a large scale trial run of DSpace at the University at Rochester. The provost there drove the effort, and there were a wide range of interests for archiving: dissertations, working papers, performance recordings from the 40s, conference proceedings. Wanted something flexible enough to handle this all.

Commercial systems were great for librarian built archives, including (expensive e-pubs bit), while open source systems were both free and extensible. Started with eprints, but ended up with DSpace because it could do a lot more in terms of providing services. It was highly flexible, format neutral, and could create “communities.”

They are part of the DSpace federation of universities doing a pretty major public beta. They should be up and running May 19.

Finally, Timothy Stephen spoke a bit about the (long) history of CIOS. I was familiar with much of this, but it was interesting to think about how early CIOS was, and how little it is recognized. It seems, in some ways, to be emblematic of the discipline of communication. Talked to him a bit after: hadn’t realized how involved Tom Jacobson was with the CIOS project.

So, there are some brief notes. Overall, I found this to be well worth the day spent. Perhaps because I am somewhat disconnected from the libraries, much of this was a pleasant surprise for me. I know there was some talk of automating CVs floating around in blog-space recently. It seems if we are to do this, we should take away the lessons of eprints (and look for interoperability), and should look at ways microcontent can help to glue together an open publishing scheme.

This entry was posted in Uncategorized and tagged . Bookmark the permalink. Trackbacks are closed, but you can post a comment.


  1. Posted 4/9/2003 at 8:00 am | Permalink

    Thanks for the summary. Appreciate it.

  2. Alex
    Posted 4/9/2003 at 11:19 am | Permalink

    My pleasure. Though in the light of day, it seems I should have edited it rather than just doing a PDA dump :).

  3. Posted 4/12/2003 at 7:41 am | Permalink

    Not a bad summary, but a few major errors that could mislead readers.
    The biggest is that you refer to self-archiving as self-publishing and
    nothing could be further from the truth: Self-archiving is the
    depositing of *published* papers, both before and after peer review
    (preprints and postprints) into the researcher’s own university eprint
    archive. Calling it self-publishing gets it all wrong, and omits the
    all-important role of peer-review, which is the essential service that
    refereed-journal publishers (20,000 of them) provide today, and will
    continue to provide, even when all papers are accessible toll-free
    through author self-archiving.

    Toll-access costs $2000 per paper, and provides access only to those
    would-be users whose universities can afford the tolls. All researchers
    want all would-be users to access their work, so they can read, use,
    cite, apply and otherwise build upon it. That is “research impact” and
    it is why these authors do and give away their research in the first
    place. They can remove the needless and obsolete impact-loss caused by
    toll-access barriers by self-archiving their research. The journals
    provide only one essential service, and that is peer review. (They
    implement the service only, as autonomous, 3rd party service providers
    for authors and readers; the peers themselves do the peer review for
    free.) The cost of implementing peer review is $500 per paper. If and
    when (but only if and when) self-archiving should reduce the market for
    the journals’ toll-access product to where it can no longer cover that
    essential $500 expense, universities will already have more than enough
    annual windfall savings from each one’s contribution to the collective
    $2000 tolls per *incoming* paper to pay the $500 per *outgoing* paper to
    cover the peer review.

    But that is all just hypothetical speculation about a future “what if”.
    What is immediately needed now, to stop the needless and
    counterproductive loss of research impact from potential users whose
    universities cannot afford the tolls (no university can afford anywhere
    near all of the 20,000 refereed journals, most can afford only a tiny
    subset) is to provide an open-access version to those who cannot afford
    the toll-access, via self-archiving.

    That’s the corrected version of my talk, which recommended that
    universities and research-funding bodies mandate self-archiving of all
    university refereed research output to maximize research impact and
    benefits for all.

    The point about peer review is that there is no reason that the price
    paid for open access should be that the quality of the peer-reviewed
    research to which we want to free access for all would-be users should
    be put at risk or compromised in any way by coupling it with untested
    alternatives to peer review. Self-archiving has been tested for over 12
    years, and it works, and works dramatically. Peer-review alternatives
    (like the “guild” system, and many other such speculative proposals)
    have not been tested, and are merely notions. Moreover, they are 100%
    unnecessary. Self-archiving is a *supplement* to the current journal
    system, not a *substitute* for it. And especially not for peer-review,
    which is its most important component: Paper publication will diminish
    with time, so will the publisher’s proprietary PDF page images and
    online add-on enhancements, but peer-review (until/unless a better
    alternative is found, tested, and proved to work at least as well) is
    here to stay. So self-archiving is about authors’ universities providing
    open access to their own *peer-reviewed* (= published) research output,
    to maximize its impact; not about self-publishing non-peer-reviewed
    preprints only, instead of both the preprints and their peer-reviewed,
    published successors, postprints, which is self-archiving, not

    By the same token, the talk by Catherine Candee of U. California
    Digital Libraries conflated a university’s self-archiving of
    peer-reviewed, published research output with the university’s
    self-publishing of its own research. Some university “in-house” journals
    may be possible, but they cannot and will not replace the 20,000
    autonomous peer-reviewed journals that exist, for the simple reason that
    universities cannot be their own peer-policemen: They haven’t the
    expertise, and it would quickly be recognized as self-serving and
    unreliable. Peer review depends on independent worldwide expertise. One
    cannot write one’s own recommendation letters.

    Universities mix up the self-archiving and self-publishing agendas
    because or revenue problems: They spend huge amounts on toll-access to
    journals. They (mistakenly) call this “buying back what we give away.”
    But this is completely incorrect! They don’t bay back what they give
    away. (They have what they give away already, and no journal publisher
    has or can have any objection to in-house uses by the author’s
    institution of the author’s own work! So university’s don’t spend their
    serials budgets on “buying back” the research output they give away —
    though it is true they give away their own research output. They *buy-in*
    — through publishers’ access-tolls — the research output of *other*
    universities, research output that those universities have themselves
    given away (to their journal publishers).

    So the “buyback” lament is the wrong one (though there is a rightful
    lament to be made, just not that one). The rightful lament is that
    *access* to universities own giveaway by would-be users at *other*
    institutions is needlessly blocked by access-tolls. But the remedy to
    that is not for universities to go into the toll-access self-publishing
    business (even if their tolls are lower)! It is to self-archive their
    own refereed research, thereby making it accessible toll-free to all
    users. The enhanced research impact is already reward enough for that.
    (It means enhanced research productivity, funding, prestige, prizes.)
    But possibly — and only possibly — it might lead to windfall
    toll-access savings on their periodicals budget, if/when open-access
    shrinks the market for toll-access, tipping the transition to up-front
    peer-review service charges, per outgoing paper, instead of the present
    toll-access charges, per incoming paper. Self-archiving is reciprocal
    across institutions.

    (Moreover, I doubt that even a system as big as U of C provides 10% of
    the journal content it buys in! And the fact that its own researchers
    are the editors of 12% of the journals it buys in, if true — I think
    that’s an overestimate too! — is also completely irrelevant, as those
    journals are independent, 3rd party journals, not U of C in-house
    journals; nor should they be, if peer-review is to continue to be an
    autonomous, expertise-based system, rather than a local vetting

    Not just U of C, but also Michigan and Rochester, in their
    presentations, are conflating the important and immediate and clearly
    remediable problem of research impact loss owing to access-denial —
    solved completely by self-archiving — with various other university
    revenue-generation schemes, such as the founding of new in-house
    journals, profiting from new online economies. There may be room for a
    few new journals to add to the existing 20,000, if they can establish
    sufficiently high-quality peer-review standards, but that is not the
    solution to the access-denial problem (if the new journals are likewise
    toll-access) and it is still a very risky proposition if the new
    journals are open-access journals, charging for peer review (before the
    windfall toll-cancellation savings from the existing 20,000 have had a
    chance to build up).

    So the message is: Don’t mix up self-archiving with self-publishing,
    don’t tamper with peer-review until/unless you have a tried and true
    alternative, and in the meanwhile, don’t delay with the self-archiving
    of your own research: Your research impact is being needlessly lost

    The URL for the free eprints archive-creating software is incorrect
    in the summary: It should be:

    Stevan Harnad

  4. Alex
    Posted 4/12/2003 at 9:21 am | Permalink

    Thank you very much for taking the time to clarify the errors. I greatly appreciate it. As I said, much of this is new (and exciting) to me, so I am not surprised that I misread some of your comments. I was particularly intrigued by your presentation, and so I especially appreciate your clarification.

  5. Posted 4/12/2003 at 1:20 pm | Permalink

    Thanks for the summary and clarifications. This is a tremendous part of what we’re trying to do over at the Disseminary (; I wish I could have come to the conference. Maybe next year. . . .

Post a Comment

Your email is never published nor shared. Required fields are marked *


You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>