What is Speech Recognition?

Using Your Voice as Input Method

Woman Talking on Her Phone at a Cafe
Marianna Massey/Taxi/Getty Images

Speech recognition is a technology that allows spoken input into systems. You talk to your computer, phone or device and it uses what you said as input to trigger some action. The technology is being used to replace other methods of input like typing, clicking or selecting in other ways. It is a means to make devices and software more user-friendly and to increase productivity.

There are plenty of applications and areas where speech recognition is used, including the military, as an aid for impaired persons (imagine a person with crippled or no hands or fingers), in the medical field, in robotics etc.

In the near future, nearly everyone will be exposed to speech recognition due to its propagation among common devices like computers and mobile phones.

Certain smartphones are making interesting use of speech recognition. The iPhone and Android devices are examples of that. Through them, you can initiate a call to a contact by just getting spoken instructions like ‘Call office’. Other commands may also be entertained, like ‘Switch on Bluetooth’. 

Problems With Speech Recognition

Speech recognition, in its version known as Speech to Text (STT), has also been used for a long time to translate spoken words into text. “You talk, it types”, as ViaVoice would say on its box. But there is one problem with STT as we know it. More than 10 years back, I tried ViaVoice and it did not last a week on my computer. Why? It was grossly inaccurate and I ended up spending more time and energy speaking and correcting than typing everything.

ViaVoice is one of the best in the industry, so imagine the rest. The technology has matured and improved, but speech to text still makes people ask questions. One of its main difficulties is the immense variations among people in pronouncing words.

Not all languages are supposed in speech recognition, and those that do are often not supported as well as English.

As a result, most devices that run speech recognition software perform reasonably only with English. 

A set of hardware requirements makes speech recognition difficult to deploy in certain cases. You need a microphone that is intelligent enough to filter off background noise but at the same time powerful enough to capture voice naturally. 

Speaking of background noise, it can cause a whole system to fail. As a result, speech recognition fails in many cases due to noises that are out of the user's control. 

Speech recognition is proving to be better off as an input method for new phones and communication technologies like VoIP, than as a productivity tool for mass text input.

Applications of Speech Recognition 

The technology is gaining popularity in many areas and has been successful in the following: 

- Device control. Just saying "OK Google" to an Android phone fires up a system that is all ears to your voice commands. 

- Car Bluetooth systems. Many cars are equipped with a system that connects its radio mechanism to your smartphone through Bluetooth. You can then make and receive calls without touching your smartphone, and can even dial numbers by just saying them. 

- Voice transcription.

In areas where people have to type a lot, some intelligent software captures their spoken words and transcribe them into text. This is current in certain word processing software. Voice transcription also works with visual voicemail