Important new developments in Arabographic Optical Character Recognition (OCR)
Institute for the Study of Muslim Civilisations, London
The Open Islamicate Texts Initiative (OpenITI) team building on the foundational open-source OCR work of the Leipzig University (LU) Alexander von Humboldt Chair for Digital Humanities—has achieved Optical Character Recognition (OCR) accuracy rates for printed classical Arabic-script texts in the high nineties. These numbers are based on our tests of seven different Arabic-script texts of varying quality and typefaces, totaling over 7,000 lines(~400 pages, 87,000 words; see Table 1 for full details). These accuracy rates not only represent a distinct improvement over the actual 2 accuracy rates of the various proprietary OCR options for printed classical Arabic-script texts, but, equally important, they are produced using an open-source OCR software called Kraken (developed by Benjamin Kiessling, LU)
Al-Usur al-Wusta: The Journal of Middle East Medievalists
Thomas Miller, M.,
G. Romanov, M.,
Savant, S. B.
(2017). Important new developments in Arabographic Optical Character Recognition (OCR). Al-Usur al-Wusta: The Journal of Middle East Medievalists, 25(10), 1-13.
Available at: https://ecommons.aku.edu/uk_ismc_faculty_publications/3