Look who’s talking : New technology allows computers to talk back
December 9, 2000 | 12:00am
Is it pure science fiction when humans speak to inanimate objects and receive responses through action or speech? Was it only a Hollywood fantasy that led us to believe in Star Trekkies talking to computers and Space Odyssey’s talking robot, Hal?
Not nearly so as science and technology advances humans into the Information Age where talking to computers in lieu of tapping the keys of a computer keyboard is thoroughly nonfiction.
Victor Zue, an expert on spoken language systems, explains that as products get smaller and smaller, you’ll need a different way of controlling your computer. "You can’t type with toothpicks," he said.
A leading provider of text-to-speech, language, and artificial intelligence solutions, Lernout & Hauspie (L&H) of Belgium has managed to gobble up one of the few speech recognition companies.
In a strategic move to further increase the breadth and depth of its technological solutions, L&H acquired Dragon Systems, Incorporated, to spread its wings across multiple and vertical markets.
But even before this development, Dragon Systems managed to position itself as a leading manufacturer of voice dictation technology in the US.
Back in the 1980s when speech recognition meant having to speak in halting, robotic rhythm, normal voice nuances could not be understood by computers. At best, garbled normal speech was virtually undetectable. Dragon Systems’ technology is said to be one of the better de-bugged systems that allows users to dictate natural spoken speech into their computers for up to 160 words per minute without need for keyboarding.
Quite surprisingly, only one Philippine-owned company’s prescience resulted in the consummation of a strategic partnership with Dragon Systems.
Zeus Technologies, a two-year-old company, is owned and managed by 30-year-old computer scientist Zeus Villanueva who has obtained practical training in the US.
While pursuing his Master’s degree at the University of Massachusetts in Boston, Villanueva obtained his practical training in Windows development at the US headquarters of Xerox.
"We were basically developing software where a user is able to scan something and edit it. Since Xerox spends a lot of its resources on information technology research, I learned much about pattern recognition technologies," Villanueva said.
Pattern recognition, more accurately known as optical character recognition (OCR), is the combined technology of converting paper documents into electronic documents and photo editing or manipulating of colored images.
Very similar to OCR is voice recognition technology. Many applications exist that range from such voice dictation programs as those developed by Dragon Systems.
Even before it extended its applications into becoming Internet-friendly, Zeus and his company had already made its foray into developing software that did just that – email integration. Your spoken commands to "Check my mail", "Read my mail", "Previous message", and "Next message" are picked up by Zeus’ computer system and recognized to follow through by executing such actions as retrieving all new messages and a voice from your computer reading through your mail.
This may sound eerily dubious, but leave it to Zeus to tweak the system. "Since mahirap makipag-usap sa computer, nilagyan namin ng mga cartoon characters," he quips. "May Merlin the Magician, may Genie. Para may kausap ka lang. Parang siya yung kausap mo. Siya yung magbubukas ng applications para sa ‘yo. Siya ang naghahanap ng website na gusto mo," Villanueva explained.
Sure enough, we tested the system and as soon as a voice command is given to a transparent orange computer (just one of the four color types with blue, violet and light green, similar to the 2000 i-Mac models that have burst onto the retail scene of late), a brilliantly-robed Merlin the Magician pops up in three-dimensional scope and starts talking to us in a low, commanding voice.
Microsoft owns characters like the ‘Paper Clip’ and ‘Einstein’ wizards that act like helpers. A simple letter from Villanueva requesting permission to use Microsoft characters was met with little resistance by the computer giant.
"Actually, there were not so many restrictions kasi gusto naman ng Microsoft na gamitin yung mga characters nila dahil marketing yon. Gusto naman nila na mag-develop kami ng software using Windows. In fact, they encourage development."
Zeus has bundled his software, labeled i-Talk, into a computer drive that goes for the same price as computers that include Windows 98 but without any pre-installed voice recognition technology. The software eases up on your daily tasks (imagine being able to read a book, watch TV, cook, wash your hands, and hey, even clean the house!) because "someone" can open up your email messages and read them to you. And it promises much more.
Just like Dragon Systems’ program capabilities, i-Talk also launches applications such as Word. Say "Open Microsoft Word" and Merlin says back, "Opening Microsoft Word." Say "Open Microsoft Excel" and Merlin returns and says "Opening Microsoft Excel". No need to push any keys on a keyboard; never mind swishing and clicking a mouse around a mouse pad.
The only limitations Villanueva disclosed is the background noise on an omni-directional microphone. But even that’s been solved.
"The desktop microphones are very sensitive, so we use headsets. They’re uni-directional so they pick up only the sound of your voice and the background noise is lessened," he said.
Another of i-Talk’s capabilities includes Internet browsing commands. A user can bookmark the websites desired with a single voice command.
"It is actually and more appropriately called voice marks," Villanueva said. This is because the system is limited in the sense that one still has to type in the website address and add it as a bookmark by clicking through. Then, when the bookmarked site is called out, the voice mark technology kicks in to search for the site in the file and to import it from either Netscape or Internet Explorer to bring it up onto the screen.
Already, a group of telecommunication biggies, like AT & T and Sun Microsystems, has forged a coalition aimed at flushing the global market with intelligent devices in homes and offices around the world. Not only will PCs be connected, but mobile car phones, PDAs, and TVs as well. Their goal: Build the Smart House.
A Smart House is defined as one that is voice recognition-enabled. Tell your refrigerator to defrost and it automatically adjusts its thermostat. Tell your coffee maker to brew your morning mug and in a few minutes, you’re sipping java. Tell your microwave to zap your meal for four minutes and voila! Whereas the first couple of interfaces were a keyboard and then a mouse and a cat (are we skipping the bar code kitty which is set to launch in 2001?), the human voice is set to be standing at the new frontier.
Taiwan’s government-funded speech recognition project manager Kuo Shih-Chung sees computers as not only being able to take dictation, follow voice commands, speaking, but carrying on a conversation as well.
"You’ll be able to ask your computer, ‘How many e-mails do I have today?’ and it will answer with a number, then tell you: "The first is from your brother-should I read it to you?" Villanueva predicts.
While Taiwan’s Industrial Technology Research Institute (ITRI) believes this can actually happen because of its reliance on word context and semantics, US-based Carnegie Mellon University’s researchers believe that the software that they are developing will have lip-reading facilities to boost effectivity.
Allowing a computer to understand human speech and talk back does conjure images of a Trekkie convention where computer robots do walk and talk. Pundits claim this is it for now; brainwave communication, notwithstanding.
Not nearly so as science and technology advances humans into the Information Age where talking to computers in lieu of tapping the keys of a computer keyboard is thoroughly nonfiction.
Victor Zue, an expert on spoken language systems, explains that as products get smaller and smaller, you’ll need a different way of controlling your computer. "You can’t type with toothpicks," he said.
A leading provider of text-to-speech, language, and artificial intelligence solutions, Lernout & Hauspie (L&H) of Belgium has managed to gobble up one of the few speech recognition companies.
In a strategic move to further increase the breadth and depth of its technological solutions, L&H acquired Dragon Systems, Incorporated, to spread its wings across multiple and vertical markets.
But even before this development, Dragon Systems managed to position itself as a leading manufacturer of voice dictation technology in the US.
Back in the 1980s when speech recognition meant having to speak in halting, robotic rhythm, normal voice nuances could not be understood by computers. At best, garbled normal speech was virtually undetectable. Dragon Systems’ technology is said to be one of the better de-bugged systems that allows users to dictate natural spoken speech into their computers for up to 160 words per minute without need for keyboarding.
Zeus Technologies, a two-year-old company, is owned and managed by 30-year-old computer scientist Zeus Villanueva who has obtained practical training in the US.
While pursuing his Master’s degree at the University of Massachusetts in Boston, Villanueva obtained his practical training in Windows development at the US headquarters of Xerox.
"We were basically developing software where a user is able to scan something and edit it. Since Xerox spends a lot of its resources on information technology research, I learned much about pattern recognition technologies," Villanueva said.
Pattern recognition, more accurately known as optical character recognition (OCR), is the combined technology of converting paper documents into electronic documents and photo editing or manipulating of colored images.
Very similar to OCR is voice recognition technology. Many applications exist that range from such voice dictation programs as those developed by Dragon Systems.
Even before it extended its applications into becoming Internet-friendly, Zeus and his company had already made its foray into developing software that did just that – email integration. Your spoken commands to "Check my mail", "Read my mail", "Previous message", and "Next message" are picked up by Zeus’ computer system and recognized to follow through by executing such actions as retrieving all new messages and a voice from your computer reading through your mail.
This may sound eerily dubious, but leave it to Zeus to tweak the system. "Since mahirap makipag-usap sa computer, nilagyan namin ng mga cartoon characters," he quips. "May Merlin the Magician, may Genie. Para may kausap ka lang. Parang siya yung kausap mo. Siya yung magbubukas ng applications para sa ‘yo. Siya ang naghahanap ng website na gusto mo," Villanueva explained.
Sure enough, we tested the system and as soon as a voice command is given to a transparent orange computer (just one of the four color types with blue, violet and light green, similar to the 2000 i-Mac models that have burst onto the retail scene of late), a brilliantly-robed Merlin the Magician pops up in three-dimensional scope and starts talking to us in a low, commanding voice.
Microsoft owns characters like the ‘Paper Clip’ and ‘Einstein’ wizards that act like helpers. A simple letter from Villanueva requesting permission to use Microsoft characters was met with little resistance by the computer giant.
"Actually, there were not so many restrictions kasi gusto naman ng Microsoft na gamitin yung mga characters nila dahil marketing yon. Gusto naman nila na mag-develop kami ng software using Windows. In fact, they encourage development."
Just like Dragon Systems’ program capabilities, i-Talk also launches applications such as Word. Say "Open Microsoft Word" and Merlin says back, "Opening Microsoft Word." Say "Open Microsoft Excel" and Merlin returns and says "Opening Microsoft Excel". No need to push any keys on a keyboard; never mind swishing and clicking a mouse around a mouse pad.
The only limitations Villanueva disclosed is the background noise on an omni-directional microphone. But even that’s been solved.
"The desktop microphones are very sensitive, so we use headsets. They’re uni-directional so they pick up only the sound of your voice and the background noise is lessened," he said.
Another of i-Talk’s capabilities includes Internet browsing commands. A user can bookmark the websites desired with a single voice command.
"It is actually and more appropriately called voice marks," Villanueva said. This is because the system is limited in the sense that one still has to type in the website address and add it as a bookmark by clicking through. Then, when the bookmarked site is called out, the voice mark technology kicks in to search for the site in the file and to import it from either Netscape or Internet Explorer to bring it up onto the screen.
A Smart House is defined as one that is voice recognition-enabled. Tell your refrigerator to defrost and it automatically adjusts its thermostat. Tell your coffee maker to brew your morning mug and in a few minutes, you’re sipping java. Tell your microwave to zap your meal for four minutes and voila! Whereas the first couple of interfaces were a keyboard and then a mouse and a cat (are we skipping the bar code kitty which is set to launch in 2001?), the human voice is set to be standing at the new frontier.
Taiwan’s government-funded speech recognition project manager Kuo Shih-Chung sees computers as not only being able to take dictation, follow voice commands, speaking, but carrying on a conversation as well.
"You’ll be able to ask your computer, ‘How many e-mails do I have today?’ and it will answer with a number, then tell you: "The first is from your brother-should I read it to you?" Villanueva predicts.
While Taiwan’s Industrial Technology Research Institute (ITRI) believes this can actually happen because of its reliance on word context and semantics, US-based Carnegie Mellon University’s researchers believe that the software that they are developing will have lip-reading facilities to boost effectivity.
Allowing a computer to understand human speech and talk back does conjure images of a Trekkie convention where computer robots do walk and talk. Pundits claim this is it for now; brainwave communication, notwithstanding.
BrandSpace Articles
<
>
- Latest
Latest
Latest
November 6, 2024 - 7:16pm
November 6, 2024 - 7:16pm
November 6, 2024 - 4:50pm
November 6, 2024 - 4:50pm
November 4, 2024 - 9:12am
November 4, 2024 - 9:12am
November 1, 2024 - 9:00am
By Aian Guanzon | November 1, 2024 - 9:00am
October 31, 2024 - 12:02pm
October 31, 2024 - 12:02pm
Recommended