Google Blog

By alex | Published: 12/16/2004

“Do you hear that, Mr. Anderson? That is the sound of inevitability.” – Agent Smith, The Matrix

Within hours of Google’s announcement that they were beginning an initiative to digitize the largest libraries in the US, most of the web had heard about it. Of course they had, Google is pretty much central to the global knowledge network.

Now, already, I think the impact of Google Scholar is being underestimated. I suspect that over the next three or four years, the scholarly search engine will have far-reaching effects on how scholars communicate. But once a significant number of books from the Stanford, Harvard, and New York City libraries are digitized, and added to the holdings of current books (Google Print), I think you will start to see some early unintended consequences:

1. Books that have entered the public domain will be cited far more often than those that have not. Since the hard part (digitization) has already happened, there will be no good reason for libraries, and especially the NYC public library, not to allow distanced access to their digitized collections that have been elevated to the public domain[1].

As a result, lazy people like myself are going to be more likely to cite the materials they can have immediate access to. We will have a mass rediscovery of fin-de-siecle scholarship.

2. Digitized books want to be free. It will be interesting to see how long it takes for all of these books to break out onto the p2p systems. Sure, it hasn’t happened with Amazon yet, but — come on! — it will. It’s just too juicy a target for educational Robin Hoods. And if the source code of Half-Life 2 can be stolen, it means that it is a question of when not if the digitized books will be pirated away[2].

I for one, welcome both of these. And in the meantime, the intended consequences are amazing to think of.

fn1. From now on, I am going to avoid the downward connotations of “falling out of copyright.”

fn2. Yes, I am fully aware that these won’t fit on someone’s jump drive. I am also aware that storage and transfer sizes continue to increase, and it just takes one to make it.

This entry was posted in Uncategorized and tagged Law & Policy, Technology. Bookmark the permalink. Post a comment or leave a trackback: Trackback URL.

6 Comments

David Brake

Posted 12/16/2004 at 10:30 am | Permalink

Why do you talk of “pirating away” these PD books? As far as I know the libraries have no intention to try to keep them from going into the public domain and being available anywhere. Nor (to my relief) is Google. Apparently the license is non-exclusive and I read somewhere that in at least one case Google will be leaving the libraries post-digitisation with the text in the form of a website so people can browse as well as searching it.

Alex Halavais

Posted 12/16/2004 at 2:29 pm | Permalink

Should have been clearer… While the libraries are scanning many public domain books, they are scanning mostly books that are still under copyright (and either in or out of print). Both Stanford and Michigan have pledged to allow a scan of their entire libraries’ contents, and I doubt the majority of that is in the public domain. It’s pretty clear that neither Google nor the libraries are likely to offer these freely online, outside of Fair Use chunks for revealing the results of a search.

Google has already promised to make the public domain books available in full text. Since copyright in the US and EU extends to at least 70 years after the death of the author (95 years in the cases at hand in the US), pretty much anything published after 1900 is likely to still be under copyright. Which is why I think (under #1) that we are going to see a lot more citation of hundred-year-old books over the next decade or two. Standing on the shoulders of giants, still, I suppose, but not on the shoulders of those standing on the shoulders of giants.

donaven

Posted 12/17/2004 at 12:31 am | Permalink

It would be wonderful for my computer to read Mark Twain!

Whose going to “filter” what books are digitized and put upon the p2p systems?

Alex Halavais

Posted 12/17/2004 at 9:51 am | Permalink

Donaven: For some of Mark Twain’s work, you already can. Project Gutenberg has been scanning books for at least a decade, including quite a bit of Twain (even in mp3 audio, something new since I last looked at the project).

theglobalchinese

Posted 12/22/2004 at 12:44 pm | Permalink

It remainds me on Chou Enlai’s (å‘¨æ©ä¾†) famous sentence, after having been asked about the historical implications
of the 1789 French Revolution, he replied: “It’s too soon to tell.”
Most books will be digitized that are elder than 80 years but as Chou Enlai mentioned it, aren-t they still very actual? I guess they are!

Dave M

Posted 7/19/2005 at 7:52 pm | Permalink

This is a very exciting project google is taking on, but will kill sites like mine :(

—
Public Domain Books: Online – http://www.PDBooksOnline.com

2 Trackbacks

By Blog de Viajes on 12/17/2004 at 9:38 am

De busquedas y papers: Google modifica el paisaje del mundo academico.
Alex Halavais escribe hoy sobre las consecuencias que tendrÃ¡n en el mundo acadÃ©mico las recientes iniciativas de Google. Por un lado, la introducciÃ³n de Google Scholar, que si bien por ahora ha tenido un impacto mÃ¡s bien pequeÃ±o, tendrÃ¡ a mediano…
By Alex Halavais » Blog Archive » Battle over books on 11/19/2005 at 12:59 pm

[…] Frankly, this is the part of the project that I find most exciting, and those involved in the project must recognize that while “Googlifying” the physical library is an exciting project in itself, the “byproduct” of this—an immense, digitized store of human knowledge, is far from negligible. Indeed, as I have noted before, such a library becomes the largest potential pirate’s booty in the history of the internet. The question is not whether the information will be liberated, but how long that will take. […]

Google Blog

6 Comments

2 Trackbacks

Post a Comment

Search

Tweets

Meta

Google Blog

Share this:

6 Comments

2 Trackbacks

Post a Comment

Search

Tweets

Meta