Home
>
Software Development
>
Open Source Voice Recognition without a Net – InApps Technology 2022

March 21, 2022 by Phu Nguyen

Open Source Voice Recognition without a Net – InApps Technology 2022

Main Contents:

Open Source Voice Recognition without a Net – InApps Technology is an article under the topic Software Development Many of you are most interested in today !! Today, let’s InApps.net learn Open Source Voice Recognition without a Net – InApps Technology in today’s post !

Install Pocketsphinx

The easiest way to install PocketSphinx is by using the Synaptic application manager.

Start synaptic on the Raspberry Pi and use the search function to get a list of programs related to “PocketSphinx.” Checkmark the list items for installation. Next, click the “Apply” button in the main synaptic toolbar, followed by another “Apply” button in the Summary pop-up window. synaptic will go through its paces and install PocketSphinx on the Pi.

Once completed, exit synaptic and move on to building a language model using the browser-based “lmtool” program.

lmtool converts a regular text file of words and phrases into corresponding sounds that are “recognized” when you run the PocketSphinx program and speak into the microphone.

I used the vim editor to build a simple language model text file. Any editor works that outputs a regular ASCII text. I put the following lines in a file called commands.txt and saved it in my /home/pi/hedleytheskull directory.

hello hedley introducing doctor torq turn light on turn light off

hello hedley

introducing doctor torq

turn light on

turn light off

You’ll probably want to use less than a couple dozen phrases to keep recognition speed up. The more words in the model, the slower the response. Running PocketSphinx without the specific -dic and -lm file options, is pretty slow since it uses a large default language model. The program will also mix and match the words in your language model, so it will recognize combinations of words, not spelled out specifically as a line in the file. “hello, doctor torq” would be recognized, for example.

I opened the Firefox browser and traveled to CMU’s Sphinx knowledge base tool page. Next, I clicked the “Browse” button where it reads “Upload a sentence corpus file” and chose commands.txt from the /home/pi/hedleytheskull directory.

You can then save the .dic and .lm files into your working directory. Alternatively, download and unzip the tar (.tgz) file to get the two files into your working directory. Mine run with the lmtool program produced a file named TAR9363.tgz. I unzipped it, in a terminal with the following command line.

pi@hedley:~ tar -xvzf TAR9363.tgz

pi@hedley:~ tar –xvzf TAR9363.tgz

tar unzipped the files into the following set.

9363.dic 9363.lm 9363.log_pronounce 9363.sent 9363.vocab

9363.dic

9363.lm

9363.log_pronounce

9363.sent

9363.vocab

That’s it for installing PocketSphinx and building a language model. Let’s now look at how to actually recognize speech from the command line.

Talk to Me

I used a standard Logitech C270 USB webcam with a built-in microphone as a voice input device.

PocketSphinx has over 120 command line options. You can see a list by typing pocketphinx_continuous at the command line. You only need a couple of them to actually get it to recognize your words.

Here’s a sample command line I used.

pi@hedley:~ pocketsphinx_continuous -dict /home/pi/hedleytheskull/speech/9363.dic -lm /home/pi/hedleytheskull/speech/9363.lm -inmic yes -adcdev plughw:2,0 -logfn /dev/null

pi@hedley:~ pocketsphinx_continuous –dict /home/pi/hedleytheskull/speech/9363.dic –lm /home/pi/hedleytheskull/speech/9363.lm –inmic yes –adcdev plughw:2,0 –logfn /dev/null

Note that I used full path names for the .dic and .lm files. The -inmic yes option basically turns on the microphone for input. Use -logfn /dev/null to suspend the mountain of log data you’d normally see on the screen without using the option. The log data, is great for diagnostics, though if you need it.

The -adcdev option took a while to figure out. This option works with the Linux audio subsystems and defines your capture device. Run the following command to find the appropriate device.

pi@hedley:~ cat /proc/asound/pcm

pi@hedley:~ cat /proc/asound/pcm

Here are the results on Hedley.

00-00: bcm2835 ALSA : bcm2835 ALSA : playback 7 00-01: bcm2835 ALSA : bcm2835 IEC958/HDMI : playback 1 01-00: MAI PCM vc4-hdmi-hifi-0 : : playback 1 02-00: USB Audio : USB Audio : capture 1

00–00: bcm2835 ALSA : bcm2835 ALSA : playback 7

00–01: bcm2835 ALSA : bcm2835 IEC958/HDMI : playback 1

01–00: MAI PCM vc4–hdmi–hifi–0 : : playback 1

02–00: USB Audio : USB Audio : capture 1

Notice the capture device at the bottom. Insert the number at the beginning of the line into the plughw parameter and you are good to go.

Enter the command, wait a couple seconds then say one of the language model lines to use the program. You should see a “ready” prompt and then shortly afterward see the spoken line appear on the screen.

Keep Talking

I’d recommend installing PocketSphinx on a clean build of the latest version of Raspbian. The speed of recognition was almost real time, a couple of weeks ago, when I first set it up the program. Since then I’ve added a lot of software on Hedley’s wimpy little 8GB low-end micro-SD card (on the Pi). It’s at about 95% capacity and could be slowing things down a bit.

Recent tests take 10 or 15 seconds to recognize a phrase. I’ll install Raspbian on a new 32GB Samsung EVO+ card shortly and expect the response to be back to normal. I highly recommend these cards.

You can also try out the other command line options, to tailor the program behavior to your needs.

The next step is to integrate speech into some of my Python programs using the PocketSphinx-to-Python API. We’ll explore that topic in a future column.

Feature image via Pixabay.

Source: InApps.net

Rate this post

Phu Nguyen

As a Senior Tech Enthusiast, I bring a decade of experience to the realm of tech writing, blending deep industry knowledge with a passion for storytelling. With expertise in software development to emerging tech trends like AI and IoT—my articles not only inform but also inspire. My journey in tech writing has been marked by a commitment to accuracy, clarity, and engaging storytelling, making me a trusted voice in the tech community.