How to setup an AI Summarization system on Linux
Follow these steps to set up an AI text summarization system on Ubuntu using the Hugging Face Transformers library.
Step 1: Install Required Dependencies
- Update your system packages:
sudo apt update sudo apt upgrade
- Install Python (if not already installed). Check if Python 3 is installed:
If Python is not installed, run:python3 --version
sudo apt install python3 python3-pip
- Install pip and venv (if not already installed):
sudo apt install python3-venv python3-pip
Step 2: Create a Virtual Environment (Optional)
- Create a virtual environment:
python3 -m venv summarization-env
- Activate the virtual environment:
source summarization-env/bin/activate
- Upgrade pip:
pip install --upgrade pip
Step 3: Install Hugging Face Transformers Library
- Install the Hugging Face transformers and datasets libraries:
pip install transformers datasets
- Install PyTorch or TensorFlow:
- For PyTorch:
pip install torch
- For TensorFlow:
pip install tensorflow
- For PyTorch:
- (Optional) If using a GPU with CUDA:
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
Step 4: Test Installation with a Sample Script
- Create a Python file named
summarizer.py
:nano summarizer.py
- Add the following Python code to perform text summarization:
from transformers import pipeline # Initialize the summarization pipeline using a pre-trained model (BART in this case) summarizer = pipeline("summarization", model="facebook/bart-large-cnn") # Input text to summarize text = """ Artificial intelligence (AI) refers to the simulation of human intelligence in machines that are programmed to think like humans and mimic their actions... """ # Perform the summarization summary = summarizer(text, max_length=50, min_length=25, do_sample=False) # Print the summary print("Summary:", summary[0]['summary_text'])
- Save the file and exit the editor (
CTRL + X
,Y
to confirm).
Step 5: Run the Script
- Run the summarization script:
python3 summarizer.py
- You should see the summarized text in your terminal.
Step 6: Customize Summarization
- Adjust the summary length by changing the
max_length
andmin_length
parameters. - Change the model by replacing
"facebook/bart-large-cnn"
with another pre-trained model like"t5-small"
or"google/pegasus-cnn_dailymail"
.
Step 7: Fine-Tuning the Model (Optional)
- Prepare your dataset where each instance consists of an input text and its corresponding summary. You can use datasets like
CNN/DailyMail
orXSum
from Hugging Face. - Load a dataset:
from datasets import load_dataset # Load the CNN/DailyMail dataset dataset = load_dataset("cnn_dailymail", "3.0.0") # View a sample print(dataset['train'][0])
- Fine-tune the model using the Hugging Face
Trainer
API for training on your dataset.
Step 8: Use GPU for Faster Inference (Optional)
If you have a GPU and CUDA installed, ensure that your model uses the GPU:
import torch
from transformers import pipeline
# Check if GPU is available
device = 0 if torch.cuda.is_available() else -1
# Initialize the summarization pipeline with GPU support
summarizer = pipeline("summarization", model="facebook/bart-large-cnn", device=device)
# Input text
text = """
Artificial intelligence (AI) refers to the simulation of human intelligence in machines...
"""
# Perform summarization
summary = summarizer(text, max_length=50, min_length=25, do_sample=False)
print("Summary:", summary[0]['summary_text'])
Conclusion
By following these steps, you will have set up an AI text summarization system on Ubuntu using Hugging Face's pre-trained models. You can experiment with different models, fine-tune them on your own datasets, and customize the summarization output as needed.