An article in the New York Times (Could I Get That Song in Elvis, Please?) introduces a new technology called “Vocaloid” that makes use of a singers performed phonemes to replicate his or her voice. This process is referred to as making voice “fonts” and while they only have two so far, the article suggests that it might be able to “rip” such fonts from signers with a large existing corpus. It also predicts that the technology is relatively cheap, and will reach consumers soon.
Of course, I want to play with it. It’s far from perfect (download samples here). But this seems to be the first step toward a refinement that will allow you to put words in people’s mouths. It’s already very difficult to trust images, but we have an innate feeling that we trust voice recordings. I want to emphasize the upside, which is a great new tool that can increase the creative output of music professionals, and also amateurs. But it does, once again, separate us from our traditional view of human skill. Just as word processors have eliminated the need to know how to spell all but the most commonly used words, and the calculator has moved the location of calculation from the head and hands into a machine, children (and, unfortunately, school administrators) may begin to ask if it is necessary really to teach voice. In either case, it would be a mistake not to recognize this as more than a novelty.