Setting Up Image-to-Text (OCR) on Ubuntu using TensorFlow

This guide provides detailed instructions on how to set up an Image-to-Text (Optical Character Recognition, OCR) system on Ubuntu using TensorFlow and Tesseract OCR.

1. Install System Prerequisites

Start by updating your Ubuntu system and installing necessary dependencies. Open a terminal and run the following commands:

sudo apt update
sudo apt upgrade
sudo apt install python3 python3-pip git
    

2. Install Tesseract OCR

Tesseract is an open-source OCR engine that will help us extract text from images. Install Tesseract and its development files using the following command:

sudo apt install tesseract-ocr libtesseract-dev libleptonica-dev
    

You can also install language data files for Tesseract if you want to support languages other than English:

sudo apt install tesseract-ocr-spa  # For Spanish, for example
    

3. Install Python Libraries

Install the required Python libraries, including TensorFlow, OpenCV, and Pytesseract (the Python wrapper for Tesseract):

pip install tensorflow opencv-python pytesseract
    

4. Create Python Script for Image-to-Text

Create a new Python script named image_to_text.py that will use Tesseract to perform OCR on images:

nano image_to_text.py
    

Paste the following code into the file:

import cv2
import pytesseract

# Specify the path to Tesseract OCR executable if not in PATH
pytesseract.pytesseract.tesseract_cmd = r'/usr/bin/tesseract'  # Update this path if necessary

# Function to perform OCR on an image
def ocr_image(image_path):
    # Load the image using OpenCV
    image = cv2.imread(image_path)
    # Convert the image to gray scale for better OCR results
    gray_image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    # Apply thresholding to get a binary image
    _, binary_image = cv2.threshold(gray_image, 150, 255, cv2.THRESH_BINARY_INV)
    # Use Tesseract to do OCR on the image
    text = pytesseract.image_to_string(binary_image)
    return text

# Main function
if __name__ == "__main__":
    image_path = "path/to/your/image.jpg"  # Replace with the path to your image
    extracted_text = ocr_image(image_path)
    print("Extracted Text:")
    print(extracted_text)
    

This script performs the following steps:

  • Loads an image using OpenCV.
  • Converts the image to grayscale for better OCR accuracy.
  • Applies thresholding to create a binary image.
  • Uses Tesseract to extract text from the binary image.
  • Prints the extracted text to the console.

5. Run the Image-to-Text Script

To execute the script, run the following command in your terminal:

python3 image_to_text.py
    

Make sure to replace path/to/your/image.jpg with the actual path to the image you want to process. The script will print the extracted text to the console.

6. Troubleshooting

If you encounter any issues, check the following:

  • Ensure Tesseract is installed correctly and the path to the Tesseract executable is correct in the script.
  • Make sure the image path is valid and the image file exists.
  • Check that the required Python libraries are installed.

7. Conclusion

You have successfully set up an Image-to-Text OCR system on Ubuntu using TensorFlow and Tesseract OCR. This system can now extract text from images using optical character recognition.