How to setup an AI Summarization system on Linux

Follow these steps to set up an AI text summarization system on Ubuntu using the Hugging Face Transformers library.

Step 1: Install Required Dependencies

  1. Update your system packages:
    sudo apt update
    sudo apt upgrade
  2. Install Python (if not already installed). Check if Python 3 is installed:
    python3 --version
    If Python is not installed, run:
    sudo apt install python3 python3-pip
  3. Install pip and venv (if not already installed):
    sudo apt install python3-venv python3-pip

Step 2: Create a Virtual Environment (Optional)

  1. Create a virtual environment:
    python3 -m venv summarization-env
  2. Activate the virtual environment:
    source summarization-env/bin/activate
  3. Upgrade pip:
    pip install --upgrade pip

Step 3: Install Hugging Face Transformers Library

  1. Install the Hugging Face transformers and datasets libraries:
    pip install transformers datasets
  2. Install PyTorch or TensorFlow:
    • For PyTorch:
      pip install torch
    • For TensorFlow:
      pip install tensorflow
  3. (Optional) If using a GPU with CUDA:
    pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

Step 4: Test Installation with a Sample Script

  1. Create a Python file named summarizer.py:
    nano summarizer.py
  2. Add the following Python code to perform text summarization:
    from transformers import pipeline
    
    # Initialize the summarization pipeline using a pre-trained model (BART in this case)
    summarizer = pipeline("summarization", model="facebook/bart-large-cnn")
    
    # Input text to summarize
    text = """
    Artificial intelligence (AI) refers to the simulation of human intelligence in machines that are programmed to think like humans and mimic their actions...
    """
    
    # Perform the summarization
    summary = summarizer(text, max_length=50, min_length=25, do_sample=False)
    
    # Print the summary
    print("Summary:", summary[0]['summary_text'])
  3. Save the file and exit the editor (CTRL + X, Y to confirm).

Step 5: Run the Script

  1. Run the summarization script:
    python3 summarizer.py
  2. You should see the summarized text in your terminal.

Step 6: Customize Summarization

  • Adjust the summary length by changing the max_length and min_length parameters.
  • Change the model by replacing "facebook/bart-large-cnn" with another pre-trained model like "t5-small" or "google/pegasus-cnn_dailymail".

Step 7: Fine-Tuning the Model (Optional)

  1. Prepare your dataset where each instance consists of an input text and its corresponding summary. You can use datasets like CNN/DailyMail or XSum from Hugging Face.
  2. Load a dataset:
    from datasets import load_dataset
    
    # Load the CNN/DailyMail dataset
    dataset = load_dataset("cnn_dailymail", "3.0.0")
    
    # View a sample
    print(dataset['train'][0])
  3. Fine-tune the model using the Hugging Face Trainer API for training on your dataset.

Step 8: Use GPU for Faster Inference (Optional)

If you have a GPU and CUDA installed, ensure that your model uses the GPU:

import torch
from transformers import pipeline

# Check if GPU is available
device = 0 if torch.cuda.is_available() else -1

# Initialize the summarization pipeline with GPU support
summarizer = pipeline("summarization", model="facebook/bart-large-cnn", device=device)

# Input text
text = """
Artificial intelligence (AI) refers to the simulation of human intelligence in machines...
"""

# Perform summarization
summary = summarizer(text, max_length=50, min_length=25, do_sample=False)
print("Summary:", summary[0]['summary_text'])

Conclusion

By following these steps, you will have set up an AI text summarization system on Ubuntu using Hugging Face's pre-trained models. You can experiment with different models, fine-tune them on your own datasets, and customize the summarization output as needed.