Setting Up Automatic Speech Recognition on Ubuntu

This guide provides step-by-step instructions on setting up an Automatic Speech Recognition (ASR) system on Ubuntu using the Vosk API, a lightweight and versatile open-source speech recognition toolkit.

1. Install System Prerequisites

First, ensure that your Ubuntu system is updated and has Python and Pip installed. Open a terminal and run the following commands:

sudo apt update
sudo apt upgrade
sudo apt install python3 python3-pip git

This will install the necessary system dependencies.

2. Install Vosk API

The Vosk API supports multiple languages and is easy to use. You can install it directly using Pip:

pip install vosk

3. Install Additional Dependencies

You will need to install some additional libraries for audio processing. Install the following libraries:

sudo apt install alsa-utils sox libsox-fmt-all

4. Download a Vosk Model

Vosk provides several pre-trained models for speech recognition. You can download a model suitable for your language. For example, to download the English model, run the following commands:

# Create a directory for models
mkdir ~/vosk-models
cd ~/vosk-models

# Download the Vosk English model
wget https://alphacephei.com/vosk/models/vosk-model-en-us-0.22.zip

# Unzip the downloaded model
unzip vosk-model-en-us-0.22.zip

After downloading, you will have a directory named vosk-model-en-us-0.22 containing the model files.

5. Create a Python Script for Speech Recognition

Create a new Python script named asr.py to perform speech recognition using the Vosk API:

nano asr.py

Then, paste the following code into the file:

import sys
import os
import wave
import json
import pyaudio
from vosk import Model, KaldiRecognizer

# Set the path to the Vosk model
model_path = "vosk-model-en-us-0.22"

# Load the Vosk model
if not os.path.exists(model_path):
    print(f"Model not found: {model_path}")
    sys.exit(1)

model = Model(model_path)
recognizer = KaldiRecognizer(model, 16000)

# Set up audio stream
p = pyaudio.PyAudio()
stream = p.open(format=pyaudio.paInt16,
                channels=1,
                rate=16000,
                input=True,
                frames_per_buffer=8000)
stream.start_stream()

print("Listening...")

# Recognize speech
while True:
    data = stream.read(4000, exception_on_overflow=False)
    if recognizer.AcceptWaveform(data):
        result = recognizer.Result()
        print(json.loads(result)["text"])
    else:
        print(recognizer.PartialResult())

This script does the following:

Loads the Vosk model for English.
Sets up an audio input stream using PyAudio.
Listens to the microphone input and recognizes speech in real-time.
Prints the recognized text to the console.

6. Install PyAudio

To run the script, you need to install PyAudio. You can install it using Pip:

pip install pyaudio

7. Run the Speech Recognition Script

Once everything is set up, you can run the speech recognition script:

python3 asr.py

Speak into your microphone, and the recognized speech will be printed to the terminal in real-time.

8. Troubleshooting

If you encounter issues with microphone access, ensure that your microphone is properly configured and recognized by the system. You can check your audio devices using:

arecord -l

This command lists all available recording devices on your system. Make sure your microphone is set as the default input device in your audio settings.

9. Conclusion

You have successfully set up an Automatic Speech Recognition system on Ubuntu using the Vosk API. This setup allows you to recognize speech in real-time and can be expanded for various applications, including voice commands, transcription services, and more.

< Automatic Speech Recognition | Audio-to-Audio >

+36 1 371 0150

Home > AI > Technology > AI Tasks > AI audio tasks > Automatic Speech Recognition > Setup

Page: 8545 | 18.218.1.38 | 79.99.42.43 | Login

Privacy | Terms of use