## Abstract

This paper proposes the use of a formal grammar for the verification of mathematical formulae for a practical mathematical OCR system. Like a C compiler detecting syntax errors in a source file, we want to have a verification mechanism to find errors in the output of mathematical OCR. A linear monadic context-free tree grammar (LM-CFTG) is employed as a formal framework to define "well-formed" mathematical formulae. A cubic time parsing algorithm for LM-CFTGs is presented. For the purpose of practical evaluation, a verification system for mathematical OCR is developed, and the effectiveness of the system is demonstrated by using the ground-truthed mathematical document database InftyCDB-1 and a misrecognition database newly constructed for this study.

Original language | English |
---|---|

Pages (from-to) | 279-298 |

Number of pages | 20 |

Journal | Mathematics in Computer Science |

Volume | 3 |

Issue number | 3 |

DOIs | |

Publication status | Published - May 1 2010 |

## All Science Journal Classification (ASJC) codes

- Computational Mathematics
- Computational Theory and Mathematics
- Applied Mathematics