Scholarly web publishing

By alex | Published: 4/9/2003

Went yesterday to a symposium: Scholarly Publishing and Archiving on the Web. This was the fifth symposium hosted by the University at Albany, and it was a small gathering, but a focused group. It was a little funny to be one of the non-librarians in the crowd, but once again it seems as though the library folks are especially willing and able to meet the challenges of new media head on. I sketched out some of the following notes during the day.

Maximizing Research Impact Through Institutional Self-Archiving

The conference began with a keynote from Stevan Harnad, who is a professor at the University of Southampton, and a force behind the creation of the eprints software. His argument is not that librarians and information scientists need to turn to electronic publishing–they are already converted. The question is what structures are needed to allow researchers to publish online. This issue meshes well with what we have been talking about with the Microcontent Research Center. What do we need to do to encourage the kind of collaborative work that scholars, at least in theory, are already interested in doing?

He began by trying to recall our higher selves, asking why we decided not to sell junk bonds. His focus was firmly on refereed journal articles. He holds aside books written for profit, and sponsored research, as separate and somehow tainted. We are, he argues, “monks, but not saints”–we can be expected to turn a profit on popular books, while “giving away” some of our most valuable ideas. This in mind, his focus is on our better selves, the part of our research that contributes to the store of knowledge.

University administrators have the same goal as researchers: research impact. As such, anything that inhibits the process of giving away research reports is a thorn in the side of both researchers and administrators. Paying for access is just such an obstacle. Open access must be made available: “those days, where tolls are paid for refereed articles, are over.”

Peer review remains essential. It is not something that is tied to the old regime. We do not want, necessarily to move away from peer review. There are a large number of hypotheses for ways to improve upon refereeing, but these remain untested (e.g., the “guild system” Kling will talk about later).

The process as it stands is now like this. You do research and write a pre-print, which is pre-refereed by your colleagues. All research appears somewhere, its just a matter of where. The pre-print is then reviewed by peer experts before being published on paper. This process is imperfect, with a lot of wasted time, but it works.

The average article costs (“to the world”) $2000 on for access. This was a necessary evil during the “Gutenburg era” but no longer. Peer review, however, remains vital. This process is dynamic. It isn’t just accept or reject, reviewers also increase the quality of submissions. We don’t have to throw out the baby with the bathwater.

Max cost per paper for peer review is $500. The money works: you can see the economics are in favor of electronic distribution. But we don’t have the time to wait for this to work itself out. We need to self-archive our refereed papers.

The existing, traditional process need not disappear. The only added element: self-archive the pre-print (optional), and self-archive the post-print (mandatory). Articles available online are cited 336% more often. Self-archiving leads to higher values in other measures of impact as well. The arXiv is good, but could be better if we decentralized process. Need to move beyond growth to take-off. All universities should mandate self-archiving and an online CVs. Assessment could work by automatically crawling everyone’s CVs, and assembling the data. The software is all available open source (!).

I really like this idea of decentralized publishing. It appeals greatly to my more anarchic side. It also seems to harness what the internet is really good at. And there are already protocols and tools available to do this stuff.

We need to do this in the School of Informatics, or at least at the departmental level. But for this to happen it needs to be mandates “top down.” Folks will resist, complaining (with reason) about time resources. Had a chance to talk to June and Miguel a little about this. We could set up a grad student (Alex or Raymond) to do some of the heavy lifting in terms of setting up the CVs or early prints. Miguel is going to play with eprints or Dspace on his server to play with a bit.

In the discussion, someone asked whether this isn’t like the Napster case all over again. In the case of Napster, the authors were wanting to be paid and the technology inhibited this, while in the case of academics, the author wants to give away report, but is blocked by copyright and publishing contracts. It is the inverse of Napster.

Panel: What systems are out there?

Simeon Warner talked about the Open Archives Initiative. Think of a number of repositories: the OAI is the glue. The initial aim was to allow interoperability between open pre-print archive. Now, it focuses on standards and “the technology part.” There is a difference between the open archive and open access, the two are not identical, but they are complementary.

The protocol for metadata harvesting grabs XML documents. These must contain some basic level of Dublin Core metadata. Repositories open themselves up to collection. This data is now searchable. Other stuff could be done as well: subject aggregation, etc.

Three ways are available to make the (meta)data available, a centralized record, one or more servers for contributions, or something more peer-based. What if you are a tiny, and only have a few papers? There is a static, non-push approach that requires only exposing a metadata file.

This process should make librarians happy, because it gets them involved in creating, as well as distributing, information and knowledge. OAI provides a basis for services to exchange metadata.

Nancy Harm demonstrates Luna Imaging. It was interesting, certainly, but the proprietary nature of the software, and the fact that it was pretty much a demo, combined with a warm room and a 4am morning to make me a bit sleepy. Very slick system though with some nice, fairly intuitive ways of manipulating images in an archive.

Catherine Candee, who is running the eScholariship project of the California Digital Library asks: Are we undermining the gift culture of the Academy?

The CDL was created in order to engage the effort is to make one library for the entire university. The first part of this was to make good use of economies of scale across the UC campuses.

In this process, it struck them that it was a little dumb to be spending $23 million in subscriptions to buy back what their paid faculty already gave away. Of the top 2000 journals, 12% of senior editors and 10% of authors are UC people. So, there were lots of people wanting to share info, and lots of disparate projects. Quickly they recognized a need for interoperability, and support for the interests of the existing faculty.

The eScholarship repository includes tools for everything from simply uploading a paper to full peer review. This is also decentralized, but down to the units, rather than putting the onus on individual faculty. Now the infrastructure is becoming more known, so the faculty is increasingly involved.

Also worked on an XML book delivery project. About 750 titles, most available only to UC (350 available locally). Right now, this is increasing hardcopy sales, as well, but “ask me in two years.”

In addition, there are several edited volumes being put together online. When an edited volume is created, the items that have been peer reviewed show up before the whole volume is done, so that a paper that has been deemed ready need not wait for the straglers. Once all are complete, the editor can write an introduction, and “publish” the book to the Web. UC Press looks at the finished volumes, and decides whether they want to print each one on paper as well.

Rob Kling: Disciplining the Internet to Promote Scholarly Communication via Publication

Electronic distribution is not just an advantage to the individual writer, but to readers as well. John Unsworth stresses the need to learn from failures as well as success, and this applies to epublishing.

Sociotechnical failure modes exist and are difficult: people may not come… etc. There seems to be a lack of a systematic investigation. There is a similarity here between complex archiving projects and bridge building. Henry Petroski finds that bridge-builders who did not look obsessively at failures in order to build good bridges tended to make the same mistakes over again. “Failure aversion” rather than just “success extension” is the way to go for building robust systems.

Why is arXive not a model that fits all? Is there an alternative that works for disciplines other than physics and such who are comfortable with multiple publication? Some researchers (e.g., protein chemists) may share data but not writing, and other novel combinations occur as well.

eBiomed should have worked as well as arXiv. When a call for comment went out, several bio association heads (Federation of Amerixan Societies for Experimental Biology) came out strongly against such an idea. They feared that membership would jump ship if their private journals were not a draw. PNAS had a different objection, saying that each paper on the site would have to be refereed so as not to draw the general quality downward.

PubMedCentral, the ultimate result, is run by associations and publishers. There were not enough problems to drive folks to the streets. There was not a cultural history of sharing in the biosciences. And many valued journals as an attention filter. In other words, there were significant social and cultural barriers that did not exist in the physics case.

Kling is looking for a middle ground between peer review and no review at all. This is his “guild model”: pulling together existing working papers, and the like, and making them more widely exchangeable.

The idea is to move toward the model of Harvard’s business school working papers: career-review, not peer review–granularity is at the person or program, not the individual papers.

DOE links together a large number of linked archives into a pre-print network. By drawing on these “guild centers” in each discipline, there can be an efficient exchange of pre-prints. Why are Info Sciences “Cheering on without joining in”? Need to fix that.

I asked Kling at the break about the potential problems with the “guild model” in terms of reputation reinforcement. He glazed over this as a potential problem several times, but I think it may be more than a bug. This is potentially a very bad thing, reinforcing the already too-inbred and introspective disciplines. It also favors the star system, rather than merit of thought or analysis. This would be a giant step backward, in my humble opinion. It looks as though he is aiming for the default system, what might end up happening if other approaches are not taken. This is a great warning against the danger of veleity, but he certainly could not sell me on the advisability of such a system.

Institutional Venue for Electronic Publication and Distribution of Scholarly Content

Maria Bonn spoke a bit about the roll of the “Scholarly Publishing Office” at the University of Michigan. In particular, their efforts to reuse patterns and software was admirable.

Susan Gibbons spoke about the effort to do a large scale trial run of DSpace at the University at Rochester. The provost there drove the effort, and there were a wide range of interests for archiving: dissertations, working papers, performance recordings from the 40s, conference proceedings. Wanted something flexible enough to handle this all.

Commercial systems were great for librarian built archives, including (expensive e-pubs bit), while open source systems were both free and extensible. Started with eprints, but ended up with DSpace because it could do a lot more in terms of providing services. It was highly flexible, format neutral, and could create “communities.”

They are part of the DSpace federation of universities doing a pretty major public beta. They should be up and running May 19.

Finally, Timothy Stephen spoke a bit about the (long) history of CIOS. I was familiar with much of this, but it was interesting to think about how early CIOS was, and how little it is recognized. It seems, in some ways, to be emblematic of the discipline of communication. Talked to him a bit after: hadn’t realized how involved Tom Jacobson was with the CIOS project.

So, there are some brief notes. Overall, I found this to be well worth the day spent. Perhaps because I am somewhat disconnected from the libraries, much of this was a pleasant surprise for me. I know there was some talk of automating CVs floating around in blog-space recently. It seems if we are to do this, we should take away the lessons of eprints (and look for interoperability), and should look at ways microcontent can help to glue together an open publishing scheme.

This entry was posted in Uncategorized and tagged Research. Bookmark the permalink. Trackbacks are closed, but you can post a comment.

5 Comments

Dorothea Salo

Posted 4/9/2003 at 8:00 am | Permalink

Thanks for the summary. Appreciate it.

Reply
Alex

Posted 4/9/2003 at 11:19 am | Permalink

My pleasure. Though in the light of day, it seems I should have edited it rather than just doing a PDA dump :).

Reply
Stevan Harnad

Posted 4/12/2003 at 7:41 am | Permalink

Not a bad summary, but a few major errors that could mislead readers.
The biggest is that you refer to self-archiving as self-publishing and
nothing could be further from the truth: Self-archiving is the
depositing of *published* papers, both before and after peer review
(preprints and postprints) into the researcher’s own university eprint
archive. Calling it self-publishing gets it all wrong, and omits the
all-important role of peer-review, which is the essential service that
refereed-journal publishers (20,000 of them) provide today, and will
continue to provide, even when all papers are accessible toll-free
through author self-archiving.

Toll-access costs $2000 per paper, and provides access only to those
would-be users whose universities can afford the tolls. All researchers
want all would-be users to access their work, so they can read, use,
cite, apply and otherwise build upon it. That is “research impact” and
it is why these authors do and give away their research in the first
place. They can remove the needless and obsolete impact-loss caused by
toll-access barriers by self-archiving their research. The journals
provide only one essential service, and that is peer review. (They
implement the service only, as autonomous, 3rd party service providers
for authors and readers; the peers themselves do the peer review for
free.) The cost of implementing peer review is $500 per paper. If and
when (but only if and when) self-archiving should reduce the market for
the journals’ toll-access product to where it can no longer cover that
essential $500 expense, universities will already have more than enough
annual windfall savings from each one’s contribution to the collective
$2000 tolls per *incoming* paper to pay the $500 per *outgoing* paper to
cover the peer review.

But that is all just hypothetical speculation about a future “what if”.
What is immediately needed now, to stop the needless and
counterproductive loss of research impact from potential users whose
universities cannot afford the tolls (no university can afford anywhere
near all of the 20,000 refereed journals, most can afford only a tiny
subset) is to provide an open-access version to those who cannot afford
the toll-access, via self-archiving.

That’s the corrected version of my talk, which recommended that
universities and research-funding bodies mandate self-archiving of all
university refereed research output to maximize research impact and
benefits for all.

The point about peer review is that there is no reason that the price
paid for open access should be that the quality of the peer-reviewed
research to which we want to free access for all would-be users should
be put at risk or compromised in any way by coupling it with untested
alternatives to peer review. Self-archiving has been tested for over 12
years, and it works, and works dramatically. Peer-review alternatives
(like the “guild” system, and many other such speculative proposals)
have not been tested, and are merely notions. Moreover, they are 100%
unnecessary. Self-archiving is a *supplement* to the current journal
system, not a *substitute* for it. And especially not for peer-review,
which is its most important component: Paper publication will diminish
with time, so will the publisher’s proprietary PDF page images and
online add-on enhancements, but peer-review (until/unless a better
alternative is found, tested, and proved to work at least as well) is
here to stay. So self-archiving is about authors’ universities providing
open access to their own *peer-reviewed* (= published) research output,
to maximize its impact; not about self-publishing non-peer-reviewed
preprints only, instead of both the preprints and their peer-reviewed,
published successors, postprints, which is self-archiving, not
self-publishing.

By the same token, the talk by Catherine Candee of U. California
Digital Libraries conflated a university’s self-archiving of
peer-reviewed, published research output with the university’s
self-publishing of its own research. Some university “in-house” journals
may be possible, but they cannot and will not replace the 20,000
autonomous peer-reviewed journals that exist, for the simple reason that
universities cannot be their own peer-policemen: They haven’t the
expertise, and it would quickly be recognized as self-serving and
unreliable. Peer review depends on independent worldwide expertise. One
cannot write one’s own recommendation letters.

Universities mix up the self-archiving and self-publishing agendas
because or revenue problems: They spend huge amounts on toll-access to
journals. They (mistakenly) call this “buying back what we give away.”
But this is completely incorrect! They don’t bay back what they give
away. (They have what they give away already, and no journal publisher
has or can have any objection to in-house uses by the author’s
institution of the author’s own work! So university’s don’t spend their
serials budgets on “buying back” the research output they give away —
though it is true they give away their own research output. They *buy-in*
— through publishers’ access-tolls — the research output of *other*
universities, research output that those universities have themselves
given away (to their journal publishers).

So the “buyback” lament is the wrong one (though there is a rightful
lament to be made, just not that one). The rightful lament is that
*access* to universities own giveaway by would-be users at *other*
institutions is needlessly blocked by access-tolls. But the remedy to
that is not for universities to go into the toll-access self-publishing
business (even if their tolls are lower)! It is to self-archive their
own refereed research, thereby making it accessible toll-free to all
users. The enhanced research impact is already reward enough for that.
(It means enhanced research productivity, funding, prestige, prizes.)
But possibly — and only possibly — it might lead to windfall
toll-access savings on their periodicals budget, if/when open-access
shrinks the market for toll-access, tipping the transition to up-front
peer-review service charges, per outgoing paper, instead of the present
toll-access charges, per incoming paper. Self-archiving is reciprocal
across institutions.

(Moreover, I doubt that even a system as big as U of C provides 10% of
the journal content it buys in! And the fact that its own researchers
are the editors of 12% of the journals it buys in, if true — I think
that’s an overestimate too! — is also completely irrelevant, as those
journals are independent, 3rd party journals, not U of C in-house
journals; nor should they be, if peer-review is to continue to be an
autonomous, expertise-based system, rather than a local vetting
system.)

Not just U of C, but also Michigan and Rochester, in their
presentations, are conflating the important and immediate and clearly
remediable problem of research impact loss owing to access-denial —
solved completely by self-archiving — with various other university
revenue-generation schemes, such as the founding of new in-house
journals, profiting from new online economies. There may be room for a
few new journals to add to the existing 20,000, if they can establish
sufficiently high-quality peer-review standards, but that is not the
solution to the access-denial problem (if the new journals are likewise
toll-access) and it is still a very risky proposition if the new
journals are open-access journals, charging for peer review (before the
windfall toll-cancellation savings from the existing 20,000 have had a
chance to build up).

So the message is: Don’t mix up self-archiving with self-publishing,
don’t tamper with peer-review until/unless you have a tried and true
alternative, and in the meanwhile, don’t delay with the self-archiving
of your own research: Your research impact is being needlessly lost
daily.

The URL for the free eprints archive-creating software is incorrect
in the summary: It should be: http://www.eprints.org

Stevan Harnad

Reply
Alex

Posted 4/12/2003 at 9:21 am | Permalink

Thank you very much for taking the time to clarify the errors. I greatly appreciate it. As I said, much of this is new (and exciting) to me, so I am not surprised that I misread some of your comments. I was particularly intrigued by your presentation, and so I especially appreciate your clarification.

Reply
AKMA

Posted 4/12/2003 at 1:20 pm | Permalink

Thanks for the summary and clarifications. This is a tremendous part of what we’re trying to do over at the Disseminary (http://disseminary.org); I wish I could have come to the conference. Maybe next year. . . .

Reply

Scholarly web publishing

5 Comments

Post a Comment

Search

Tweets

Meta

Scholarly web publishing

Share this:

5 Comments

Post a Comment

Search

Tweets

Meta