Home > Published Issues > 2018 > Volume 6, No. 1, June 2018 >

Mobile Application for Recognizing Text in Degraded Document Images Using Optical Character Recognition with Adaptive Document Image Binarization

Angie M. Ceniza, Tom Kalvin B. Archival, and Kate V. Bongo
DCIS, University of San Carlos Cebu City, Philippines

Abstract—Books and documents go through degradation overtime and post threats in the readability of the printed text. Degradations like stains can overlap with the text covering it or ink fading can cause the removal of the text altogether. Converting these texts into digital format can help preserve them. Optical Character Recognition (OCR) is used to transform them into digital text. And, with the increasing computing capability and digital imaging of today’s smartphones. We can use them as a convenient tool to capture images of these document and do OCR directly. In this paper, we propose a mobile application that can recognize text in degraded document images using Tesseract as the OCR engine with Adaptive Document Image Binarization to improve the performance of the OCR engine in degraded documents images. The experimental results showed an average character accuracy of 93.17% and word accuracy of 85.82% across 8 degraded document images.

Index Terms—OCR, binarization, degradation, mobile application

Cite: Angie M. Ceniza, Tom Kalvin B. Archival, and Kate V. Bongo, "Mobile Application for Recognizing Text in Degraded Document Images Using Optical Character Recognition with Adaptive Document Image Binarization," Journal of Image and Graphics, Vol. 6, No. 1, pp. 44-47, June 2018. doi: 10.18178/joig.6.1.44-47