The verbal boxing match over Google Library at the New York Public Library on Thursday (there is a Quicktime stream of the debate at that site) was a bit more lively than most scholarly roundtables. At times, it seemed like the audience’s champagne might have been spiked with a bit of Jerry Springer juice.
The majority of the discussion, both the meatiest and most discursive bits, occurred between Allan Adler and Lawrence Lessig. Lessig has his own post mortem, but a few things really stuck out for me, and these were threads that were largely left hanging.
Certainly, the take-away for me was that “fair use” needs to be re-defined for digital media. Or rather, “fair use” needs to be defined. What none of the parties would agree on is that one of the difficulties in translating fair use into the digital age is that it is fairly ambiguous in the analog age. There are certain uses that are clearly covered by fair use, certain uses that are clearly infringing, and a not insignificant number of uses that are really at the whim of whatever judge hears the case — if it gets that far.
But something that struck me even more directly was an overlooked difference of opinion. At one point, David Drummond (the representative from Google) claimed that “a digital card catalog requires a full copy” of the original texts. I quickly wondered whether this claim was true. Because if it is not true, it yields a much more interesting, and potentially tractable, description of the disagreement between Google and the publishers.
Since you do not have the benefit of (Wired editor) Chris Anderson’s frequently proffered visual aid (which led to a bit of eye rolling after it was produced a fourth and fifth time) let me start by reminding you of what Google Print Library (Google Book Search as of Thursday) provides you from copyrighted, in-print works. Amazon’s “search inside the book” actually reproduces the pages around the word or phrase you search for. Google only provides a certain number of words on each side, what they call a “snippet” of text.
Now, I could be wrong, but it seemed that all parties were relatively OK with this use, that they felt that it fell within the bounds of fair use. They did not articulate this precisely on the publisher/author side, but they instead focused on the way that this index was obtained: by scanning the full text of thousands of books. Even if this library never left Google HQ, it could easily be argued, as Adler said several times that night, that by scanning the books of several excellent research libraries, Google will have accumulated the largest digital library in the world. This hasn’t happened for free — scanning books is expensive — but it has happened without further money being paid to the publishers.
Frankly, this is the part of the project that I find most exciting, and those involved in the project must recognize that while “Googlifying” the physical library is an exciting project in itself, the “byproduct” of this — an immense, digitized store of human knowledge, is far from negligible. Indeed, as I have noted before, such a library becomes the largest potential pirate’s booty in the history of the internet. The question is not whether the information will be liberated, but how long that will take.
There is an alternative. There is no reason that Google should have to keep the original files. Word frequency “thumbprints” are enough to do the search, and it is possible to store the snippets without keeping them in their original format. Of course, if you store the snippets, it is just a matter of time to be able to reassemble them, but at least you get away from the “complete copy” issue.
Alternatively, I bet Google would avoid a world of hurt buy buying rather than borrowing these books. I think the authors and publishers would be happier with a more extensive licensing agreement, but they would have a lot less to complain about if they bought duplicates of the books (those that are in print) that they will be scanning. Worst case, how much could this cost? $60-$100 million?
In any case, it was an interesting discussion, and it wouldn’t have been if there hadn’t been someone there to at least volley with Lessig. I know he already has a lot of fans, but I have often been disappointed by the ability of those who write well to form their opinions “on the fly.” This is clearly not a problem for Lessig.
Battle over books
The verbal boxing match over Google Library at the New York Public Library on Thursday (there is a Quicktime stream of the debate at that site) was a bit more lively than most scholarly roundtables. At times, it seemed like the audience’s champagne might have been spiked with a bit of Jerry Springer juice.
The majority of the discussion, both the meatiest and most discursive bits, occurred between Allan Adler and Lawrence Lessig. Lessig has his own post mortem, but a few things really stuck out for me, and these were threads that were largely left hanging.
Certainly, the take-away for me was that “fair use” needs to be re-defined for digital media. Or rather, “fair use” needs to be defined. What none of the parties would agree on is that one of the difficulties in translating fair use into the digital age is that it is fairly ambiguous in the analog age. There are certain uses that are clearly covered by fair use, certain uses that are clearly infringing, and a not insignificant number of uses that are really at the whim of whatever judge hears the case — if it gets that far.
But something that struck me even more directly was an overlooked difference of opinion. At one point, David Drummond (the representative from Google) claimed that “a digital card catalog requires a full copy” of the original texts. I quickly wondered whether this claim was true. Because if it is not true, it yields a much more interesting, and potentially tractable, description of the disagreement between Google and the publishers.
Since you do not have the benefit of (Wired editor) Chris Anderson’s frequently proffered visual aid (which led to a bit of eye rolling after it was produced a fourth and fifth time) let me start by reminding you of what Google Print Library (Google Book Search as of Thursday) provides you from copyrighted, in-print works. Amazon’s “search inside the book” actually reproduces the pages around the word or phrase you search for. Google only provides a certain number of words on each side, what they call a “snippet” of text.
Now, I could be wrong, but it seemed that all parties were relatively OK with this use, that they felt that it fell within the bounds of fair use. They did not articulate this precisely on the publisher/author side, but they instead focused on the way that this index was obtained: by scanning the full text of thousands of books. Even if this library never left Google HQ, it could easily be argued, as Adler said several times that night, that by scanning the books of several excellent research libraries, Google will have accumulated the largest digital library in the world. This hasn’t happened for free — scanning books is expensive — but it has happened without further money being paid to the publishers.
Frankly, this is the part of the project that I find most exciting, and those involved in the project must recognize that while “Googlifying” the physical library is an exciting project in itself, the “byproduct” of this — an immense, digitized store of human knowledge, is far from negligible. Indeed, as I have noted before, such a library becomes the largest potential pirate’s booty in the history of the internet. The question is not whether the information will be liberated, but how long that will take.
There is an alternative. There is no reason that Google should have to keep the original files. Word frequency “thumbprints” are enough to do the search, and it is possible to store the snippets without keeping them in their original format. Of course, if you store the snippets, it is just a matter of time to be able to reassemble them, but at least you get away from the “complete copy” issue.
Alternatively, I bet Google would avoid a world of hurt buy buying rather than borrowing these books. I think the authors and publishers would be happier with a more extensive licensing agreement, but they would have a lot less to complain about if they bought duplicates of the books (those that are in print) that they will be scanning. Worst case, how much could this cost? $60-$100 million?
In any case, it was an interesting discussion, and it wouldn’t have been if there hadn’t been someone there to at least volley with Lessig. I know he already has a lot of fans, but I have often been disappointed by the ability of those who write well to form their opinions “on the fly.” This is clearly not a problem for Lessig.
Share this: