The State of Linux Voice Recognition

Transcription technology fares better than voice commands

Man looking at screen

Negativespace / Mockup.Photos 

Speech recognition in Linux trails the Windows and Mac platforms because both Microsoft and Apple have invested considerable time and expense into adding voice-command or voice-assistant software into their core operating systems.

Although the situation isn't bleak for Linux, as with so many cutting-edge technologies, the free and open-source universe remains a step behind—particularly with voice-command tools.

Native Linux Speech Recognition

No Linux distribution focuses on speech recognition. However, apps that support speech-recognition capability rely on a handful of open-source libraries including Sphinx, Kaldi, Julius, and Mozilla Deepspeech.

These libraries rely on a speech corpus to offer variations of sounds to train the AI and therefore correctly translate the speech to text. However, the open-source projects are considerably less sophisticated (because they enjoy significantly smaller contributions to train the AI), which means that most text-to-speech apps for Linux frequently botch the conversion. Usually, they botch it so thoroughly that it's not clear what the original speech could have been.

Options for Linux Speech to Text

Google Assistant displays a transcript for screened calls.

Use one of five solution pathways.

First, rely on native Linux apps available in your distribution's repositories—if, indeed, any appear.

Second, Amazon has made Alexa available for Linux, including for Raspberry Pi. You'll need to perform a lot of custom tweaking to make this arrangement work, but it will work.

Third, access the Google Speech API in your browser through DictationIO. This service works for dictation only; you can't use it for voice command, but it's powered by Google's own AI so the quality is quite good.

Fourth, use a service like Alexa or Google Assistant as a voice-command utility for Linux through the Triggercmd service. Triggercmd runs on your computer; use it to invoke Alexa or Google Assistant and have those tools execute specific Bash scripts based on your command. Say something like "OK Google, ask trigger command to open the calculator." Google Assistant serves as an intermediary with Triggercmd to run the Bash script specified by the phrase "open the calculator."

Finally, use Wine or a virtual machine with software for Windows like Dragon NaturallySpeaking. With the right tweaking, you can use Dragon's engine for transcription, although this solution won't work for voice-command applications.