Google Blog

“Do you hear that, Mr. Anderson? That is the sound of inevitability.” – Agent Smith, The Matrix

Within hours of Google’s announcement that they were beginning an initiative to digitize the largest libraries in the US, most of the web had heard about it. Of course they had, Google is pretty much central to the global knowledge network.

Now, already, I think the impact of Google Scholar is being underestimated. I suspect that over the next three or four years, the scholarly search engine will have far-reaching effects on how scholars communicate. But once a significant number of books from the Stanford, Harvard, and New York City libraries are digitized, and added to the holdings of current books (Google Print), I think you will start to see some early unintended consequences:

1. Books that have entered the public domain will be cited far more often than those that have not. Since the hard part (digitization) has already happened, there will be no good reason for libraries, and especially the NYC public library, not to allow distanced access to their digitized collections that have been elevated to the public domain[1].

As a result, lazy people like myself are going to be more likely to cite the materials they can have immediate access to. We will have a mass rediscovery of fin-de-siecle scholarship.

2. Digitized books want to be free. It will be interesting to see how long it takes for all of these books to break out onto the p2p systems. Sure, it hasn’t happened with Amazon yet, but — come on! — it will. It’s just too juicy a target for educational Robin Hoods. And if the source code of Half-Life 2 can be stolen, it means that it is a question of when not if the digitized books will be pirated away[2].

I for one, welcome both of these. And in the meantime, the intended consequences are amazing to think of.

fn1. From now on, I am going to avoid the downward connotations of “falling out of copyright.”

fn2. Yes, I am fully aware that these won’t fit on someone’s jump drive. I am also aware that storage and transfer sizes continue to increase, and it just takes one to make it.

This entry was posted in Uncategorized and tagged , . Bookmark the permalink. Post a comment or leave a trackback: Trackback URL.


  1. Posted 12/16/2004 at 10:30 am | Permalink

    Why do you talk of “pirating away” these PD books? As far as I know the libraries have no intention to try to keep them from going into the public domain and being available anywhere. Nor (to my relief) is Google. Apparently the license is non-exclusive and I read somewhere that in at least one case Google will be leaving the libraries post-digitisation with the text in the form of a website so people can browse as well as searching it.

  2. Posted 12/16/2004 at 2:29 pm | Permalink

    Should have been clearer… While the libraries are scanning many public domain books, they are scanning mostly books that are still under copyright (and either in or out of print). Both Stanford and Michigan have pledged to allow a scan of their entire libraries’ contents, and I doubt the majority of that is in the public domain. It’s pretty clear that neither Google nor the libraries are likely to offer these freely online, outside of Fair Use chunks for revealing the results of a search.

    Google has already promised to make the public domain books available in full text. Since copyright in the US and EU extends to at least 70 years after the death of the author (95 years in the cases at hand in the US), pretty much anything published after 1900 is likely to still be under copyright. Which is why I think (under #1) that we are going to see a lot more citation of hundred-year-old books over the next decade or two. Standing on the shoulders of giants, still, I suppose, but not on the shoulders of those standing on the shoulders of giants.

  3. Posted 12/17/2004 at 12:31 am | Permalink

    It would be wonderful for my computer to read Mark Twain!

    Whose going to “filter” what books are digitized and put upon the p2p systems?

  4. Posted 12/17/2004 at 9:51 am | Permalink

    Donaven: For some of Mark Twain’s work, you already can. Project Gutenberg has been scanning books for at least a decade, including quite a bit of Twain (even in mp3 audio, something new since I last looked at the project).

  5. Posted 12/22/2004 at 12:44 pm | Permalink

    It remainds me on Chou Enlai’s (周恩來) famous sentence, after having been asked about the historical implications
    of the 1789 French Revolution, he replied: “It’s too soon to tell.”
    Most books will be digitized that are elder than 80 years but as Chou Enlai mentioned it, aren-t they still very actual? I guess they are!

  6. Posted 7/19/2005 at 7:52 pm | Permalink

    This is a very exciting project google is taking on, but will kill sites like mine :(

    Public Domain Books: Online –

2 Trackbacks

  1. By Blog de Viajes on 12/17/2004 at 9:38 am

    De busquedas y papers: Google modifica el paisaje del mundo academico.
    Alex Halavais escribe hoy sobre las consecuencias que tendrán en el mundo académico las recientes iniciativas de Google. Por un lado, la introducción de Google Scholar, que si bien por ahora ha tenido un impacto más bien pequeño, tendrá a mediano…

  2. […] Frankly, this is the part of the project that I find most exciting, and those involved in the project must recognize that while “Googlifying” the physical library is an exciting project in itself, the “byproduct” of this—an immense, digitized store of human knowledge, is far from negligible. Indeed, as I have noted before, such a library becomes the largest potential pirate’s booty in the history of the internet. The question is not whether the information will be liberated, but how long that will take. […]

Post a Comment

Your email is never published nor shared. Required fields are marked *


You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>