Important new developments in Arabographic Optical Character Recognition (OCR)

Document Type

Article

Department

Institute for the Study of Muslim Civilisations, London

Abstract

The Open Islamicate Texts Initiative (OpenITI) team building on the foundational open-source OCR work of the Leipzig University (LU) Alexander von Humboldt Chair for Digital Humanities—has achieved Optical Character Recognition (OCR) accuracy rates for printed classical Arabic-script texts in the high nineties. These numbers are based on our tests of seven different Arabic-script texts of varying quality and typefaces, totaling over 7,000 lines(~400 pages, 87,000 words; see Table 1 for full details). These accuracy rates not only represent a distinct improvement over the actual 2 accuracy rates of the various proprietary OCR options for printed classical Arabic-script texts, but, equally important, they are produced using an open-source OCR software called Kraken (developed by Benjamin Kiessling, LU)

Publication (Name of Journal)

Al-Usur al-Wusta: The Journal of Middle East Medievalists

Share

COinS