Mathematical documents are analyzed from several viewpoints for the development of practical OCR for mathematical and other scientific documents. Specifically, four viewpoints are quantified using a large-scale database of mathematical documents, containing 690,000 manually ground-truthed characters: (i) the number of character categories, (ii) abnormal characters (e.g., touching characters), (iii) character size variation, and (iv) the complexity of the mathematical expressions. The result of these analyses clarifies the difficulties of recognizing mathematical documents and then suggests several promising directions to overcome them.
|Number of pages||8|
|Journal||International Journal on Document Analysis and Recognition|
|Publication status||Published - Sep 1 2005|
All Science Journal Classification (ASJC) codes
- Computer Vision and Pattern Recognition
- Computer Science Applications