This paper proposes the use of a formal grammar for the verification of mathematical formulae for a practical mathematical OCR system. Like a C compiler detecting syntax errors in a source file, we want to have a verification mechanism to find errors in the output of mathematical OCR. A linear monadic context-free tree grammar (LM-CFTG) is employed as a formal framework to define "well-formed" mathematical formulae. A cubic time parsing algorithm for LM-CFTGs is presented. For the purpose of practical evaluation, a verification system for mathematical OCR is developed, and the effectiveness of the system is demonstrated by using the ground-truthed mathematical document database InftyCDB-1 and a misrecognition database newly constructed for this study.
All Science Journal Classification (ASJC) codes
- Computational Mathematics
- Computational Theory and Mathematics
- Applied Mathematics