Detecting Mathematical Expressions in Scientific Document Images Using a U-Net Trained on a Diverse Dataset

Wataru Ohyama, Masakazu Suzuki, Seiichi Uchida

Research output: Contribution to journalArticle

Abstract

A detection method for mathematical expressions in scientific document images is proposed. Inspired by the promising performance of U-Net, a convolutional network architecture originally proposed for the semantic segmentation of biomedical images, the proposed method uses image conversion by a U-Net framework. The proposed method does not use any information from mathematical and linguistic grammar so that it can be a supplemental bypass in the conventional mathematical optical character recognition (OCR) process pipeline. The evaluation experiments confirmed that (1) the performance of mathematical symbol and expression detection by the proposed method is superior to that of InftyReader, which is state-of-the-art software for mathematical OCR; (2) the coverage of the training dataset to the variation of document style is important; and (3) retraining with small additional training samples will be effective to improve the performance. An additional contribution is the release of a dataset for benchmarking the OCR for scientific documents.

Original languageEnglish
Article number8861044
Pages (from-to)144030-144042
Number of pages13
JournalIEEE Access
Volume7
DOIs
Publication statusPublished - Jan 1 2019

Fingerprint

Optical character recognition
Information use
Benchmarking
Network architecture
Linguistics
Pipelines
Semantics
Experiments

All Science Journal Classification (ASJC) codes

  • Computer Science(all)
  • Materials Science(all)
  • Engineering(all)

Cite this

Detecting Mathematical Expressions in Scientific Document Images Using a U-Net Trained on a Diverse Dataset. / Ohyama, Wataru; Suzuki, Masakazu; Uchida, Seiichi.

In: IEEE Access, Vol. 7, 8861044, 01.01.2019, p. 144030-144042.

Research output: Contribution to journalArticle

@article{b455d157adc84a608ba8eb46c26f4ec8,
title = "Detecting Mathematical Expressions in Scientific Document Images Using a U-Net Trained on a Diverse Dataset",
abstract = "A detection method for mathematical expressions in scientific document images is proposed. Inspired by the promising performance of U-Net, a convolutional network architecture originally proposed for the semantic segmentation of biomedical images, the proposed method uses image conversion by a U-Net framework. The proposed method does not use any information from mathematical and linguistic grammar so that it can be a supplemental bypass in the conventional mathematical optical character recognition (OCR) process pipeline. The evaluation experiments confirmed that (1) the performance of mathematical symbol and expression detection by the proposed method is superior to that of InftyReader, which is state-of-the-art software for mathematical OCR; (2) the coverage of the training dataset to the variation of document style is important; and (3) retraining with small additional training samples will be effective to improve the performance. An additional contribution is the release of a dataset for benchmarking the OCR for scientific documents.",
author = "Wataru Ohyama and Masakazu Suzuki and Seiichi Uchida",
year = "2019",
month = "1",
day = "1",
doi = "10.1109/ACCESS.2019.2945825",
language = "English",
volume = "7",
pages = "144030--144042",
journal = "IEEE Access",
issn = "2169-3536",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - JOUR

T1 - Detecting Mathematical Expressions in Scientific Document Images Using a U-Net Trained on a Diverse Dataset

AU - Ohyama, Wataru

AU - Suzuki, Masakazu

AU - Uchida, Seiichi

PY - 2019/1/1

Y1 - 2019/1/1

N2 - A detection method for mathematical expressions in scientific document images is proposed. Inspired by the promising performance of U-Net, a convolutional network architecture originally proposed for the semantic segmentation of biomedical images, the proposed method uses image conversion by a U-Net framework. The proposed method does not use any information from mathematical and linguistic grammar so that it can be a supplemental bypass in the conventional mathematical optical character recognition (OCR) process pipeline. The evaluation experiments confirmed that (1) the performance of mathematical symbol and expression detection by the proposed method is superior to that of InftyReader, which is state-of-the-art software for mathematical OCR; (2) the coverage of the training dataset to the variation of document style is important; and (3) retraining with small additional training samples will be effective to improve the performance. An additional contribution is the release of a dataset for benchmarking the OCR for scientific documents.

AB - A detection method for mathematical expressions in scientific document images is proposed. Inspired by the promising performance of U-Net, a convolutional network architecture originally proposed for the semantic segmentation of biomedical images, the proposed method uses image conversion by a U-Net framework. The proposed method does not use any information from mathematical and linguistic grammar so that it can be a supplemental bypass in the conventional mathematical optical character recognition (OCR) process pipeline. The evaluation experiments confirmed that (1) the performance of mathematical symbol and expression detection by the proposed method is superior to that of InftyReader, which is state-of-the-art software for mathematical OCR; (2) the coverage of the training dataset to the variation of document style is important; and (3) retraining with small additional training samples will be effective to improve the performance. An additional contribution is the release of a dataset for benchmarking the OCR for scientific documents.

UR - http://www.scopus.com/inward/record.url?scp=85073622420&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85073622420&partnerID=8YFLogxK

U2 - 10.1109/ACCESS.2019.2945825

DO - 10.1109/ACCESS.2019.2945825

M3 - Article

AN - SCOPUS:85073622420

VL - 7

SP - 144030

EP - 144042

JO - IEEE Access

JF - IEEE Access

SN - 2169-3536

M1 - 8861044

ER -