Setting Up Tabular Classification on Ubuntu using PyTorch

This guide provides a step-by-step approach to setting up a Tabular Classification model on Ubuntu using PyTorch. Tabular classification involves training a model to predict categorical outcomes based on tabular data (like CSV files).

1. Install System Prerequisites

Begin by updating your Ubuntu system and installing necessary development tools:

sudo apt update
sudo apt upgrade
sudo apt install python3 python3-pip git
    

2. Install PyTorch and Required Libraries

Next, install PyTorch and other packages necessary for data handling and model evaluation:

pip install torch torchvision pandas scikit-learn
    

3. Prepare Your Tabular Dataset

For this example, place your dataset in a CSV format in a dedicated folder. Assume the CSV has the following format:

  • Feature columns: Numerical or categorical variables.
  • Target column: The label you want to predict.

Save the dataset as data.csv in a new data folder:

mkdir data
cp /path/to/your/data.csv data/
    

4. Write the Tabular Classification Script

Create a Python script named tabular_classification.py with the following code:

import pandas as pd
import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.metrics import accuracy_score

# Load and preprocess the data
data = pd.read_csv('data/data.csv')
target_column = 'target'  # Replace with the actual target column name

# Separate features and target
X = data.drop(target_column, axis=1)
y = data[target_column]

# Encode categorical labels
label_encoder = LabelEncoder()
y = label_encoder.fit_transform(y)

# Split data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize the feature columns
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Convert data to PyTorch tensors
X_train_tensor = torch.tensor(X_train, dtype=torch.float32)
X_test_tensor = torch.tensor(X_test, dtype=torch.float32)
y_train_tensor = torch.tensor(y_train, dtype=torch.long)
y_test_tensor = torch.tensor(y_test, dtype=torch.long)

# Define a simple neural network model for tabular classification
class TabularModel(nn.Module):
    def __init__(self, input_dim, hidden_dim, output_dim):
        super(TabularModel, self).__init__()
        self.fc1 = nn.Linear(input_dim, hidden_dim)
        self.relu = nn.ReLU()
        self.fc2 = nn.Linear(hidden_dim, output_dim)
    
    def forward(self, x):
        out = self.fc1(x)
        out = self.relu(out)
        out = self.fc2(out)
        return out

# Initialize model, loss function, and optimizer
input_dim = X_train.shape[1]
hidden_dim = 64  # Adjust as needed
output_dim = len(np.unique(y))
model = TabularModel(input_dim, hidden_dim, output_dim)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Train the model
num_epochs = 50
for epoch in range(num_epochs):
    # Forward pass
    outputs = model(X_train_tensor)
    loss = criterion(outputs, y_train_tensor)
    
    # Backward pass and optimization
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    
    if (epoch + 1) % 10 == 0:
        print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item():.4f}')

# Evaluate the model on the test set
with torch.no_grad():
    test_outputs = model(X_test_tensor)
    _, predicted = torch.max(test_outputs, 1)
    accuracy = accuracy_score(y_test, predicted.numpy())
    print(f'Test Accuracy: {accuracy * 100:.2f}%')
    

5. Run the Tabular Classification Script

Execute the script to train and evaluate the tabular classification model:

python3 tabular_classification.py
    

The script will output the loss at regular intervals during training and the test accuracy after evaluation.

6. Customize the Model

You can adjust parameters like hidden_dim, num_epochs, and learning rate in the script to experiment with different configurations and improve model performance.

Conclusion

You have successfully set up a Tabular Classification system on Ubuntu using PyTorch. This setup can serve as a foundation for more complex classification tasks and can be further optimized with additional feature engineering and model tuning.