Thresholding for Mobile OCR: An Introduction – Part 2

Project Description

Thresholding for Mobile OCR: An Introduction – Part 2

Last week we gave you an An Introduction to Binary, Truncate & To Zero Thresholding, which we hope you found useful! This blog post will dive a little deeper into the thresholding topic with Otsu Thresholding and Adaptive Thresholding. So let’s get started!

OTSU THRESHOLDING

Otsu’s method of thresholding, named after Nobuyuki Otsu who first published this thresholding method in 1979, is used to automatically perform clustering-based image thresholding.
But what does that mean?

In global thresholding, one arbitrary value is used as threshold. So in order to get a good result image, we need to find the right threshold value which is basically a trial and error process. Since we want an automated thresholding algorithm, we need a better method to find the right threshold.
Consider a bi-modal image, an image whose histogram has two peaks (aka clusters). A good threshold value for such an image would be a value in the middle of those peaks, which is exactly what the Otsu method does. It automatically calculates a threshold value from the image histogram of a bi-modal image.

Explore computer vision with the free Anyline OCR SDK!

Download the SDK Now!
Otsu thresholding input Otsu thresholding histogram Otsu thresholding image
input image histogram
red line = threshold
Otsu threshold image

In case you are interested in more detailed information on how Otsu tresholding works, continue reading. Otherwise skip this section and directly continue with the code examples.

Variance is a measure of region homogeneity, which means regions with high homogeneity will have a low variance. Otsu’s algorithm searches for the threshold that minimizes the intra-class variance. In order to do so, one has to consider all possible thresholds and compute the variance for each of the two classes of pixels (i.e., the class below and above threshold).

Otsu thresholding formula

Where the weights omega_i are the probabilities for the two classes given by the relative number of pixels in each class separated by the threshold t and sigma_i_2 are the variances for each class.

Computing this intra-class variance for each of the two classes for each possible threshold involves a lot of computation, but luckily there is a much faster way. If the intra-class variance is extracted from the total variance of the combined distribution, the so-called inter-class variance is the result:

Otsu thresholding intraclass

Where the class probabilities omega_i are computed from the histogram as:

Otsu thresholding omega

and

Otsu thresholding omega

While the class means are computed like:

Otsu thresholding means

and

Otsu thresholding means

Where x_i  is the value at the center of the ith histogram bin.

Drawbacks

  • The method assumes that the histogram of the image is bi-modal
  • It breaks down when the two classes are very unequal (i.e. large size difference) which could result in two maxima for sigma_B_2
    • The correct maximum is not necessary the global one.
    • The selected threshold should correspond to a valley of the histogram.
  •  The method does not work well with variable illumination.

C++ Code

To execute Otsu thresholding with OpenCV it is necessary to pass an additional flag (THRESH_OTSU) to the threshold() function as well as one of the five threshold types explained in the previous section. Simply pass 0 as a threshold value, it is omitted anyway. The algorithm will then find the optimal threshold value, which will be returned as value of type double. For maxValue it is possible to pass any non-zero value. This value will be assigned to every pixel greater than the threshold value. In this example we used 255 to get a black and white binary image.

using namespace cv;

// Read image
Mat src = imread("threshold.png", IMREAD_GRAYSCALE);
Mat dst;// Otsu Thresholding
thresh = threshold(src,dst, 0, 255, THRESH_BINARY | THRESH_OTSU);

Results

Otsu thresholding results Otsu thresholding results Otsu thresholding results
Otsu thresholding results Otsu thresholding results Otsu thresholding results
input image histogram
red line = threshold
Otsu threshold image

 

ADAPTIVE THRESHOLDING

In the previous algorithms we used one global threshold to binarize the image, which works fine if you have a relatively uniform background. However, a single threshold will not work well if there is a large variation in the background intensity due to shadows or the the direction of illumination.
In that case it is better to use Adaptive Thresholding (aka local, dynamic or areal thresholding).

Adaptive thresholding input Adaptive thresholding binary Adaptive thresholding
input image binary thresholding
thresh = 100
adaptive threshold

The idea of this algorithm is to partition the image into smaller sub-images and then calculate a different threshold for each sub-image. This approach might lead to sub-images having simpler histograms which will usually generate better results for images with uneven illumination.

Adaptive thresholding image

OpenCV provides a function to perform adaptive thresholding:

double cv::adaptiveThreshold(
cv::InputArray src // input image (8 bit, single channel)
cv::OutputArray dst // result image
double maxValue // the maximal (non-zero) value that can be assigned to output
int adaptiveMethod // adaptive Thresholding algorithm (see Table 2)
int thresholdType // use THRESH_BINARY or THRESH_BINARY_INV only
int blockSize // size of pixel neighborhood e.g. 3,5,7,9,etc. 
double C // Constant subtracted from mean or weighted mean usually positive but may be 0 or negative as well
);

There are two methods to calculate the weighted mean for the blockSize * blockSize neighborhood:

1. ADAPTIVE_THRESH_MEAN_C

The threshold value T(x,y) is a mean of the blocksize * blocksize neighborhood of pixel (x,y) minus a constant value C.

2. ADAPTIVE_THRESH_GAUSSIAN_C

The threshold value T(x,y) is a weighted mean of the blocksize * blocksize neighborhood of pixel (x,y) minus a constant value C . The pixel values closer to the center of the neighborhood have a higher weight when calculating the mean value.

C++ Code

using namespace cv;

// Read image
Mat src = imread("threshold.png", IMREAD_GRAYSCALE);
Mat dst;

// Set maxValue, blockSize and c (constant value)
double maxValue = 255;
int blockSize = 9;
double c = 41;

// Adaptive Threshold
adaptiveThreshold(src, dst, thresh, ADAPTIVE_THRESH_GAUSSIAN_C, THRESH_BINARY, blockSize, c);

Results

The following table shows the results of applying adaptive thresholding on the input image with different values.

Adaptive thresholding result Adaptive thresholding result Adaptive thresholding result
blockSize = 5
c = 41
blockSize = 7
c = 41
blockSize = 9
c = 41

 

Multilevel Thresholding

So far we only discussed thresholding based on grayscale images. However, it is also possible to threshold color images. This approach is called multilevel, multiband or simply multi thresholding and gradually gains more relevance with the increasing number of color documents. One approach is to designate a separate threshold for each of the RGB channels and then combine them with an AND operation.

This reflects the way the camera works and how the data is stored, but it does not correspond to the way that people recognize color. Therefore the HSL & HSV or CMYK color models are more often used which mostly require more sophisticated thresholding algorithms resulting in higher computational complexity.
These approaches are rather complicated and would be too extensive for this blog post but don’t hesitate to contact us if you have any questions!

This was our introduction on mobile thresholding. We hope we could give you a good and concise overview on this topic and that you stay tuned for more!

QUESTIONS? LET US KNOW!

If you have questions, suggestions or feedback on this, please don’t hesitate to reach out to us via FacebookTwitter or simply via [email protected]! Cheers!