Machine-reading Arabic

Feb. 17, 2005
Computer scientists at the University at Buffalo's Center for Unified Biometrics and Sensors (CUBS) are developing the first optical character-recognition (OCR) software for documents handwritten and machine printed in Arabic.

The new software will make it possible to scan Arabic documents digitally in search of specific information or keywords for intelligence-gathering and other applications.

The new software is said to be powerful enough to recognize handwritten annotations in the margins of a machine-printed documents.

Its development is said to be particularly noteworthy because Arabic presents important challenges to computer science. Characters may take different forms if they appear at the beginning, middle, or end of a word, so boundaries between words are not always marked consistently. And Arabic vowels are pronounced, but often not written. So in addition to the benefits for readers of Arabic, this project will help push the frontiers of computer vision, pattern recognition, and artificial intelligence in general.

A new software tool is also being developed to create OCR software for Devanagari script, which will allow digitization of documents in Sanskrit, Hindi, and dozens of other Indian and South Asian languages.

Voice your opinion!

To join the conversation, and become an exclusive member of Machine Design, create an account today!