Good Listener

  • Share
  • Read Later
Dimitri Kanevsky helps computers understand human speech, a surprising line of work for someone who has been deaf since age 3. The odds are good that if your computer can transcribe a voice or your cell phone knows which number to dial when you tell it to "Call Julie," Kanevsky, 50, is partly responsible. His algorithms are at work inside much of the software that helps translate human speech into the digital language that machines understand.

The holy grail of speech science — making conversation with a computer easier than typing — is still a long way off. Today's software makes lots of mistakes, and its main market is users who have no other option, including office workers who suffer from carpal-tunnel syndrome. Yet progress is being made, and Kanevsky's technology is sneaking into daily life. His employer, IBM, and competitors Nuance and Speechworks offer enterprise products that replace those endless touch-tone-phone menus with a computerized attendant that can connect you directly to the right person. The big users are banks and airlines, who use the software to let callers book flights automatically. About $240 million worth of speech software was sold in 2001, according to tech research firm IDC. It expects sales to surpass $1 billion by 2006.

Kanevsky is an unlikely player in this industry. He had to get a special waiver to study math at Moscow State University, where no accommodation was made for him. He learned to lip-read teachers and fellow students, and still relies on lipreading — in English and Russian — rather than the sign language used by most deaf people.

Kanevsky's first invention, designed while he was in school, is a wearable motor that translates speech into lower-frequency vibrations that can be felt on the skin. Kanevsky created it to help him learn to lip-read the speech of new acquaintances. He marketed the product through an Israeli company and still wears the original device on his arm. IBM's speech-research team, impressed by his math genius and practical inventiveness, outbid others to bring him to the company's Yorktown Heights, N.Y., research facility in 1986.

Because of his deafness, Kanevsky says, "I considered speech more mathematically and tried to find mathematical patterns rather than acoustic ones." One of his first breakthroughs was to teach computers a new way of picking out individual words in a stream of sound. Kanevsky's mathematical method made it possible for people to talk to ordinary computers without pausing ... after ... each ... word. Derivatives of this algorithm, first published in 1991, are used in many of today's voice-recognition products.

Still, the technology has been slow to evolve, and IBM nearly killed its speech-recognition project in the early 1990s, when the company was struggling. "We had no product," Kanevsky explains. So he made a grandstand play. He connected recognition software to a telephone and was able to read the words of callers from around the world without the help of a human transcriptionist. "It helped push the morale of speech researchers higher," he says, and impressed senior managers enough to save the project.

Kanevsky's latest device is a tablet-size computer, with microphone attached, that transcribes whatever it hears. The software is "speaker independent," meaning it can transcribe anyone's speech without having to learn the voice of each new speaker. An earlier version of the software was a hit with deaf visitors to Kanevsky's lab, who carried it around on much bulkier laptops. The visitors, who did not read lips, found that the software let them connect more easily with the outside world, making others' speech visible. The new version is small enough to carry in a free hand. It's still experimental, but judging from Kanevsky's past, it may find a market.