Table of Contents
What is OCR?
Optical Character Recognition(OCR) is typically used to detect text from within images. You may have used applications such as scanning a credit card or check with your phone in order for the software to extract important information. The technology used for that is called OCR.
In this workshop we will be taking a look at OCR and implementing our own software that uses a pre-existing OCR library to read the text off any image that we present.
EasyOCR
The library we’ll be taking a look at is easyOCR. EasyOCR is a python package that easily performs optical character recognition. It can be implemented quickly and simply.
Other OCR libraries usually have different dependencies that make them hard to work with, but as the name suggests easyOCR is easy to implement and use. The library supports over 80 languages and is robust even with noisier images.
It was buily using PyTorch which enables us to use CUDA-capable GPUs in order to speed up the detection tremendously.
At the moment easyOCR mainly supports OCR friendly text; text that is easily legible and not handwritten. The library is quickly expanding and plans to eventually support handwritten detection.
For more information on how the model was trained check out their github
Getting Started
For this weeks live coding session we will be implementing easyOCR and using openCV in order to analyze the model’s results. Let’s get started!
Live Coding
Complete Code
Imports
As usual we will begin by loading in our imports. Unlike other libraries built in to Google Colab, in order to use easyOCR we first have to install the library using the python package manager.
!pip install easyocr
We use the exclamation point in Google Colab in order to let the cell know that we are running a command and not python code.
from easyocr import Reader
import matplotlib.pyplot as plt
import urllib
import cv2
Then we import the rest of our packages.
- easyocr: Used for basic OCR functions we will be implementing
- matplotlib: Used for plotting our images
- urllib: Allows us to grab random pictures from the internet and turn them into interpretable images
- cv2: computer vision library we use to draw on our images
A key thing to note is that we only import Reader from easyocr. Reader is the main class we will be using in order grab text from images which is why it’s the only class/function we need to import from easyocr.
Loading in Images
First we have to pull an image from the web and convert it to a usable format
def url_to_image(url):
resp = urllib.request.urlretrieve(url, 'img_from_web.jpg')
image = cv2.imread('img_from_web.jpg')
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
return image
We create this helper function to do that work for us. We first send a request to retrieve the image from the url and save it to our local storage. After, we read it using opencv and store it as an image. Finally we convert the colorspace of our image. Opencv reads in images as BGR and we want to see them as RGB so we make the appropriate conversion
url = "https://www.bbvausa.com/content/dam/bbva/usa/en/photos/checking-and-savings/clearpoints-card-gateway-sm.png"
img = url_to_image(url)
plt.imshow(img)
plt.show()
We then store our URL as a string variable and invoke our url_to_image function in order to retrieve the image. Afterwards, we plot this image to make sure everything looks correct
Extracting Text From Our Image
Now that we have our image in place, we use easyOCR to extract text from our image.
reader = Reader(['en'], gpu=True)
results = reader.readtext(img)
We create an instance of a Reader object. We first specify what language we want to detect in. Multiple languages can be passed in this list. We also specify we’d like to use GPU for the detection in order to speed up the process. Finally, we use our reader to read the text. If we print the read data we get the following:
([[62, 49], [313, 49], [313, 139], [62, 139]], 'BBVA', 0.9981149435043335)
We have 3 main pieces of data that are being read: bounding box location, text, and probability. we then use this information to add text to our image.
Processing Our Image
We’ll want to put our results in a for loop in order to make sure we process all the information obtained.
for (bbox, text, prob) in results:
# Grab bounding box values
(tl, tr, br, bl) = bbox
tl = (int(tl[0]), int(tl[1])) # top left
tr = (int(tr[0]), int(tr[1])) # top right
br = (int(br[0]), int(br[1])) # bottom right
bl = (int(bl[0]), int(bl[1])) # bottom left
cv2.rectangle(img, tl, br, (0, 255, 0), 2)
cv2.putText(img, text, (tl[0], tl[1]),
cv2.FONT_HERSHEY_SIMPLEX, 0.8, (0, 255, 255), 2)
We grab the four corners detected based on the (x,y) coordinates provided by easyOCR’s detection. We then take this information and draw a rectangle around the text and add the text to the image.
plt.imshow(img)
plt.show()
img = cv2.cvtColor(img, cv2.COLOR_RGB2BGR)
cv2.imwrite("OCR'ed_image.jpg", img) # automatically converts from bgr to rgb
Finally we display our image and save it as a file for later usage.