Setting Up Audio Classification on Ubuntu using TensorFlow

This guide provides detailed instructions on setting up an audio classification system on Ubuntu using TensorFlow. We will build a simple audio classification model that can categorize audio files based on their content.

1. Install System Prerequisites

First, ensure that your Ubuntu system is updated and has Python and Pip installed. Open a terminal and run the following commands:

sudo apt update
sudo apt upgrade
sudo apt install python3 python3-pip git
    

This will install the necessary system dependencies.

2. Install TensorFlow

To install TensorFlow, run the following command. Make sure you have the correct NVIDIA drivers and CUDA installed if you want GPU support:

pip install tensorflow
    

For CPU-only installations, this command is sufficient.

3. Install Additional Libraries

We will need additional libraries for audio processing and handling. Install the following packages:

pip install numpy librosa soundfile
    

These libraries will help in loading, processing, and saving audio files.

4. Prepare Your Dataset

For this example, we will use a dataset containing audio files categorized into different folders based on their labels. Organize your dataset as follows:

dataset/
    ├── class1/
    │   ├── audio1.wav
    │   ├── audio2.wav
    │   └── ...
    ├── class2/
    │   ├── audio1.wav
    │   ├── audio2.wav
    │   └── ...
    └── ...
    

Ensure you have several audio files in each class directory.

5. Create a Python Script for Audio Classification

Create a new Python script named audio_classification.py to build and train the classification model:

nano audio_classification.py
    

Paste the following code into the file:

import os
import numpy as np
import librosa
import tensorflow as tf
from sklearn.model_selection import train_test_split

# Function to load and preprocess audio files
def load_data(dataset_path):
    labels = []
    features = []
    
    for class_label in os.listdir(dataset_path):
        class_folder = os.path.join(dataset_path, class_label)
        if os.path.isdir(class_folder):
            for audio_file in os.listdir(class_folder):
                file_path = os.path.join(class_folder, audio_file)
                audio, sr = librosa.load(file_path, sr=None)
                
                # Extract features (e.g., MFCC)
                mfccs = librosa.feature.mfcc(y=audio, sr=sr, n_mfcc=13)
                features.append(mfccs)
                labels.append(class_label)

    return np.array(features), np.array(labels)

# Load the dataset
dataset_path = 'path/to/your/dataset'  # Replace with your dataset path
X, y = load_data(dataset_path)

# Encode labels
label_to_index = {label: index for index, label in enumerate(np.unique(y))}
y_encoded = np.array([label_to_index[label] for label in y])

# Split the dataset
X_train, X_test, y_train, y_test = train_test_split(X, y_encoded, test_size=0.2, random_state=42)

# Reshape the features for the model
X_train = np.array([x.reshape(-1, 13) for x in X_train])
X_test = np.array([x.reshape(-1, 13) for x in X_test])

# Create a simple model
model = tf.keras.Sequential([
    tf.keras.layers.Flatten(input_shape=(X_train.shape[1], 13)),
    tf.keras.layers.Dense(128, activation='relu'),
    tf.keras.layers.Dense(len(label_to_index), activation='softmax')
])

# Compile the model
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

# Train the model
model.fit(X_train, y_train, epochs=10, batch_size=32, validation_data=(X_test, y_test))

# Save the model
model.save('audio_classification_model.h5')
print('Model saved as audio_classification_model.h5')
    

This script performs the following steps:

  • Loads audio files and extracts features (MFCCs).
  • Encodes the labels for the classes.
  • Splits the dataset into training and testing sets.
  • Defines a simple neural network model for classification.
  • Trains the model and saves it for future use.

6. Run the Audio Classification Script

Once everything is set up, you can run the audio classification script. Make sure to replace path/to/your/dataset with the actual path to your dataset:

python3 audio_classification.py
    

The model will train on your dataset and save the trained model as audio_classification_model.h5.

7. Troubleshooting

If you encounter issues, ensure that:

  • All libraries are correctly installed.
  • The dataset path is correctly specified in the script.
  • The audio files are in a supported format (e.g., WAV).

8. Conclusion

You have successfully set up an audio classification system on Ubuntu using TensorFlow. You can further refine the model by experimenting with different architectures, tuning hyperparameters, or using more advanced feature extraction methods.