Battle over books

The verbal boxing match over Google Library at the New York Public Library on Thursday (there is a Quicktime stream of the debate at that site) was a bit more lively than most scholarly roundtables. At times, it seemed like the audience’s champagne might have been spiked with a bit of Jerry Springer juice.

The majority of the discussion, both the meatiest and most discursive bits, occurred between Allan Adler and Lawrence Lessig. Lessig has his own post mortem, but a few things really stuck out for me, and these were threads that were largely left hanging.

Certainly, the take-away for me was that “fair use” needs to be re-defined for digital media. Or rather, “fair use” needs to be defined. What none of the parties would agree on is that one of the difficulties in translating fair use into the digital age is that it is fairly ambiguous in the analog age. There are certain uses that are clearly covered by fair use, certain uses that are clearly infringing, and a not insignificant number of uses that are really at the whim of whatever judge hears the case — if it gets that far.

But something that struck me even more directly was an overlooked difference of opinion. At one point, David Drummond (the representative from Google) claimed that “a digital card catalog requires a full copy” of the original texts. I quickly wondered whether this claim was true. Because if it is not true, it yields a much more interesting, and potentially tractable, description of the disagreement between Google and the publishers.

Since you do not have the benefit of (Wired editor) Chris Anderson’s frequently proffered visual aid (which led to a bit of eye rolling after it was produced a fourth and fifth time) let me start by reminding you of what Google Print Library (Google Book Search as of Thursday) provides you from copyrighted, in-print works. Amazon’s “search inside the book” actually reproduces the pages around the word or phrase you search for. Google only provides a certain number of words on each side, what they call a “snippet” of text.

Now, I could be wrong, but it seemed that all parties were relatively OK with this use, that they felt that it fell within the bounds of fair use. They did not articulate this precisely on the publisher/author side, but they instead focused on the way that this index was obtained: by scanning the full text of thousands of books. Even if this library never left Google HQ, it could easily be argued, as Adler said several times that night, that by scanning the books of several excellent research libraries, Google will have accumulated the largest digital library in the world. This hasn’t happened for free — scanning books is expensive — but it has happened without further money being paid to the publishers.

Frankly, this is the part of the project that I find most exciting, and those involved in the project must recognize that while “Googlifying” the physical library is an exciting project in itself, the “byproduct” of this — an immense, digitized store of human knowledge, is far from negligible. Indeed, as I have noted before, such a library becomes the largest potential pirate’s booty in the history of the internet. The question is not whether the information will be liberated, but how long that will take.

There is an alternative. There is no reason that Google should have to keep the original files. Word frequency “thumbprints” are enough to do the search, and it is possible to store the snippets without keeping them in their original format. Of course, if you store the snippets, it is just a matter of time to be able to reassemble them, but at least you get away from the “complete copy” issue.

Alternatively, I bet Google would avoid a world of hurt buy buying rather than borrowing these books. I think the authors and publishers would be happier with a more extensive licensing agreement, but they would have a lot less to complain about if they bought duplicates of the books (those that are in print) that they will be scanning. Worst case, how much could this cost? $60-$100 million?

In any case, it was an interesting discussion, and it wouldn’t have been if there hadn’t been someone there to at least volley with Lessig. I know he already has a lot of fans, but I have often been disappointed by the ability of those who write well to form their opinions “on the fly.” This is clearly not a problem for Lessig.

    Great post… It’s so easy for so much to be lost or misrepresented in this debate (see: for example).

    However, I’m not sure I take your point about only needing the word frequency “thumbprints”… what if Google wanted to incorporate more powerful search indexing or operators in the future (such as the Lexis w/n operator that finds words withing n words of each other). Unless I’m missing something, and I frequently am, word frequency information wouldn’t be enough… and damn if it wouldn’t be a pain in the ass to do the scanning again!

    I’m just curious how this scenario plays out…

    1) Is this “opt-out” approach to books suitable for other media? If not, what makes books special? Should a band be forced to “opt-out” every time a corporation wants to use one of its pieces in a television commercial? What is more feasible for the law to cover, is what I’m really getting at: should a copyright holder be forced to monitor any and all media traffic in order to “opt-out” of any use they do not approve of? Or does it make more sense to assume that the rights are protected and force the person desiring to use/licence the work determine if they are permitted to do so? If I release an album, will it be my responsibility to contact every business with enough money to use a song from that album in a television ad, a film, or to distribute parts of it online as part of a “music library” and tell them individual yes, no, or to what extent they can use it? Or is it more logical (and easier to legislate/enforce) to have them simply come to me and ask?

    2) Where do moral rights come into play (I know these are things not entirely recognized by the United States, but the United States does not control the Internet, no matter how much they may wish to believe otherwise; authors from England and Canada and Germany will also be affected and audiences in those nations will conceivably also have access)? I have heard about plans involving targeted advertising; will an author writing about ocean conservation have the right to object if an advertisement for wholesale frozen fish-sticks is placed in prominent relation to his work? And will it be his responsibility to monitor the standards of Google’s advertising program and object to every of the millions (or more) possible combinations of advertisements that Google may produce at any given second? Is such a thing feasible? Should the author have any input at all into the context in which his work is displayed, or does Google have the exclusive right to an opinion in this case as well?

    3) Where does Google’s stance on progress come into play with its censorship of searches in China. Are only Westerners to be granted access to this grand library?

    4) The assumption is that authors transfer copyright to publishers; this is not necessarily so. Often it is simply a case of licencing distribution in certain formats and (geographical) markets, *including* digital markets. Does this action taken by Google invalidate contracts between authors and publishers for exclusive digital distrobution (in whole or in part)?

    5) Finally, do we really want a private, for-profit corporation holding the keys to the largest warehouse of information (both public and private) in history based on no greater obligation than that of a handshake and the words “trust me”? (Was there not, at some point in the early days of Gmail, an issue about Google’s TOS granting them the right to archive private correspondence long after it has been “destroyed” by the author, and even to give out personal information if it believes it necessary? Does that really engender trust? Do we want *that* corporation – or any corporation – controlling that much information?)

    1) Is this “opt-out” approach to books suitable for other media? …

    The whole “opt-out” thing is a dead end. Google argued that they were under no legal obligation to offer this, they were just trying to play nice. Since it doesn’t make publishers happy, I’m not sure why they offers it. Seems to weaken their case.

    2) Where do moral rights come into play…

    That is what is directly at question here. On one side, many publishers and authors accept that they have the inalienable right to profit from their original work. Others question whether this monopoly really is a right at all (i.e., whether ideas can be owned). The answer is almost certainly somewhere in-between.

    But the point here is that the legal questions mirror–or at least echo–the moral questions.

    3) Where does Google’s stance on progress come into play with its censorship of searches in China. Are only Westerners to be granted access to this grand library?

    Certainly an interesting question. I would have less trouble with them blocking the whole thing, than I would with them blocking only some books.

    For example, does anyone know whether the collected letters of Hitler would be legal in Germany? (It doesn’t appear that Mein Kampf is there, though it is widely available elsewhere on the Web.)

    4) The assumption is that authors transfer copyright to publishers; this is not necessarily so…

    This was brought up by one of the discussants. Who has the commercial rights to the material doesn’t particularly matter–fair use would still apply whether the work was “owned” by a publishing house or an individual author.

    5) Finally, do we really want a private, for-profit corporation holding the keys to the largest warehouse of information (both public and private) in history based on no greater obligation than that of a handshake and the words “trust me”?

    Leaving aside the second part of this, yes, I think this is a concern. Google’s response: nobody is stopping others from doing it too. And Yahoo and Amazon (among others) are trying to do it too. Each group is doing so in partnership with non-profits and educational institutions. The academic libraries gain a copy of their own materials in electronic form.

    Just for the record, when I mentioned “moral rights”, what I’m referring to is (from wikipedia):

    “Independently of the author’s economic rights, and even after the transfer of the said rights, the author shall have the right to claim authorship of the work and to object to any distortion, mutilation or other modification of, or other derogatory action in relation to, the said work, which would be prejudicial to his honor or reputation.”

