The State of Linux Voice Recognition

Transcription technology fares better than voice commands

Speech recognition in Linux trails the Windows and Mac platforms because both Microsoft and Apple have invested considerable time and expense into adding voice-command or voice-assistant software into their core operating systems.

Although the situation isn't bleak for Linux, as it is with many cutting-edge technologies, the free and open-source universe remains a step behind, particularly with voice-command tools.

Linux Speech Recognition

No Linux distribution focuses on speech recognition. However, apps that support speech-recognition capability rely on a handful of open-source libraries including Sphinx, Kaldi, Julius, and Mozilla Deepspeech.

Man looking at screen
Negativespace / Mockup.Photos 

These libraries rely on a speech corpus to offer variations of sounds to train the AI and therefore correctly translate the speech to text. However, open-source projects are less sophisticated (because they enjoy smaller contributions to train the AI), which means that most text-to-speech apps for Linux frequently botch the conversion. Usually, they botch it so thoroughly that it's not clear what the original speech could have been.

Options for Linux Speech to Text

Use one of five solution pathways.

  • Rely on Linux apps available in your distribution's repositories—if any appear.
  • Amazon made Alexa available for Linux, including for Raspberry Pi. You'll need to perform a lot of custom tweaking to make this arrangement work, but it will work.
  • Access the Google Speech API in your browser through DictationIO. This service works for dictation only; you can't use it for voice command. It's powered by Google's AI so the quality is good.
Google Assistant displays a transcript for screened calls.
  • Use a service like Alexa or Google Assistant as a voice-command utility for Linux through the Triggercmd service. Triggercmd runs on your computer; use it to invoke Alexa or Google Assistant and have those tools execute specific Bash scripts based on your command. Say something like, "OK Google, ask trigger command to open the calculator." Google Assistant serves as an intermediary with Triggercmd to run the Bash script specified by the phrase "open the calculator."
  • Use Wine or a virtual machine with software for Windows like Dragon NaturallySpeaking. With the right tweaking, you can use the Dragon engine for transcription, although this solution doesn't work for voice-command applications.