Quantitative analysis of mathematical documents

Seiichi Uchida, Akihiro Nomura, Masakazu Suzuki

Research output: Contribution to journalArticle

19 Citations (Scopus)

Abstract

Mathematical documents are analyzed from several viewpoints for the development of practical OCR for mathematical and other scientific documents. Specifically, four viewpoints are quantified using a large-scale database of mathematical documents, containing 690,000 manually ground-truthed characters: (i) the number of character categories, (ii) abnormal characters (e.g., touching characters), (iii) character size variation, and (iv) the complexity of the mathematical expressions. The result of these analyses clarifies the difficulties of recognizing mathematical documents and then suggests several promising directions to overcome them.

Original languageEnglish
Pages (from-to)211-218
Number of pages8
JournalInternational Journal on Document Analysis and Recognition
Volume7
Issue number4
DOIs
Publication statusPublished - Sep 1 2005

Fingerprint

Optical character recognition
Chemical analysis

All Science Journal Classification (ASJC) codes

  • Software
  • Computer Vision and Pattern Recognition
  • Computer Science Applications

Cite this

Quantitative analysis of mathematical documents. / Uchida, Seiichi; Nomura, Akihiro; Suzuki, Masakazu.

In: International Journal on Document Analysis and Recognition, Vol. 7, No. 4, 01.09.2005, p. 211-218.

Research output: Contribution to journalArticle

Uchida, Seiichi ; Nomura, Akihiro ; Suzuki, Masakazu. / Quantitative analysis of mathematical documents. In: International Journal on Document Analysis and Recognition. 2005 ; Vol. 7, No. 4. pp. 211-218.
@article{b04d10a88d154708a0f947d45bf029a2,
title = "Quantitative analysis of mathematical documents",
abstract = "Mathematical documents are analyzed from several viewpoints for the development of practical OCR for mathematical and other scientific documents. Specifically, four viewpoints are quantified using a large-scale database of mathematical documents, containing 690,000 manually ground-truthed characters: (i) the number of character categories, (ii) abnormal characters (e.g., touching characters), (iii) character size variation, and (iv) the complexity of the mathematical expressions. The result of these analyses clarifies the difficulties of recognizing mathematical documents and then suggests several promising directions to overcome them.",
author = "Seiichi Uchida and Akihiro Nomura and Masakazu Suzuki",
year = "2005",
month = "9",
day = "1",
doi = "10.1007/s10032-005-0142-y",
language = "English",
volume = "7",
pages = "211--218",
journal = "International Journal on Document Analysis and Recognition",
issn = "1433-2833",
publisher = "Springer Verlag",
number = "4",

}

TY - JOUR

T1 - Quantitative analysis of mathematical documents

AU - Uchida, Seiichi

AU - Nomura, Akihiro

AU - Suzuki, Masakazu

PY - 2005/9/1

Y1 - 2005/9/1

N2 - Mathematical documents are analyzed from several viewpoints for the development of practical OCR for mathematical and other scientific documents. Specifically, four viewpoints are quantified using a large-scale database of mathematical documents, containing 690,000 manually ground-truthed characters: (i) the number of character categories, (ii) abnormal characters (e.g., touching characters), (iii) character size variation, and (iv) the complexity of the mathematical expressions. The result of these analyses clarifies the difficulties of recognizing mathematical documents and then suggests several promising directions to overcome them.

AB - Mathematical documents are analyzed from several viewpoints for the development of practical OCR for mathematical and other scientific documents. Specifically, four viewpoints are quantified using a large-scale database of mathematical documents, containing 690,000 manually ground-truthed characters: (i) the number of character categories, (ii) abnormal characters (e.g., touching characters), (iii) character size variation, and (iv) the complexity of the mathematical expressions. The result of these analyses clarifies the difficulties of recognizing mathematical documents and then suggests several promising directions to overcome them.

UR - http://www.scopus.com/inward/record.url?scp=29444441304&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=29444441304&partnerID=8YFLogxK

U2 - 10.1007/s10032-005-0142-y

DO - 10.1007/s10032-005-0142-y

M3 - Article

VL - 7

SP - 211

EP - 218

JO - International Journal on Document Analysis and Recognition

JF - International Journal on Document Analysis and Recognition

SN - 1433-2833

IS - 4

ER -