Font distribution observation by network-based analysis

Chihiro Nakamoto, Rong Huang, Sota Koizumi, Ryosuke Ishida, Yaokai Feng, Seiichi Uchida

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

The off-the-shelf Optical Character Recognition (OCR) engines return mediocre performance on the decorative characters which usually appear in natural scenes such as signboards. A reasonable way towards the so-called camera-based OCR is to collect a large-scale font set and analyze the distribution of font samples for realizing some character recognition engine which is tolerant to font shape variations. This paper is concerned with the issue of font distribution analysis by network. Minimum Spanning Tree (MST) is employed to construct font network with respect to Chamfer distance. After clustering, some centrality criterion, namely closeness centrality, eccentricity centrality or betweenness centrality, is introduced for extracting typical font samples. The network structure allows us to observe the font shape transition between any two samples, which is useful to create new fonts and recognize unseen decorative characters. Moreover, unlike the Principal Component Analysis (PCA), the font network fulfills distribution visualization through measuring the dissimilarity between samples rather than the lossy processing of dimensionality reduction. Compared with K-means algorithm, network-based clustering has the ability to preserve small size font clusters which generally consist of samples taking special appearances. Experiments demonstrate that the proposed network-based analysis is an effective way to grasp font distribution, and thus provides helpful information for decorative character recognition.

Original languageEnglish
Title of host publicationCamera-Based Document Analysis and Recognition - 5th International Workshop, CBDAR 2013, Revised Selected Papers
PublisherSpringer Verlag
Pages83-97
Number of pages15
ISBN (Print)9783319051666
DOIs
Publication statusPublished - Jan 1 2014
Event5th International Workshop on Camera-Based Document Analysis and Recognition, CBDAR 2013 - Washington, DC, United States
Duration: Aug 23 2013Aug 23 2013

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume8357 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other5th International Workshop on Camera-Based Document Analysis and Recognition, CBDAR 2013
CountryUnited States
CityWashington, DC
Period8/23/138/23/13

Fingerprint

Optical character recognition
Character recognition
Character Recognition
Centrality
Engines
Electric power distribution
Principal component analysis
Visualization
Cameras
Engine
Clustering
Processing
Betweenness
K-means Algorithm
Distribution Network
Eccentricity
Minimum Spanning Tree
Dissimilarity
Dimensionality Reduction
Network Structure

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Computer Science(all)

Cite this

Nakamoto, C., Huang, R., Koizumi, S., Ishida, R., Feng, Y., & Uchida, S. (2014). Font distribution observation by network-based analysis. In Camera-Based Document Analysis and Recognition - 5th International Workshop, CBDAR 2013, Revised Selected Papers (pp. 83-97). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 8357 LNCS). Springer Verlag. https://doi.org/10.1007/978-3-319-05167-3_7

Font distribution observation by network-based analysis. / Nakamoto, Chihiro; Huang, Rong; Koizumi, Sota; Ishida, Ryosuke; Feng, Yaokai; Uchida, Seiichi.

Camera-Based Document Analysis and Recognition - 5th International Workshop, CBDAR 2013, Revised Selected Papers. Springer Verlag, 2014. p. 83-97 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 8357 LNCS).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Nakamoto, C, Huang, R, Koizumi, S, Ishida, R, Feng, Y & Uchida, S 2014, Font distribution observation by network-based analysis. in Camera-Based Document Analysis and Recognition - 5th International Workshop, CBDAR 2013, Revised Selected Papers. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 8357 LNCS, Springer Verlag, pp. 83-97, 5th International Workshop on Camera-Based Document Analysis and Recognition, CBDAR 2013, Washington, DC, United States, 8/23/13. https://doi.org/10.1007/978-3-319-05167-3_7
Nakamoto C, Huang R, Koizumi S, Ishida R, Feng Y, Uchida S. Font distribution observation by network-based analysis. In Camera-Based Document Analysis and Recognition - 5th International Workshop, CBDAR 2013, Revised Selected Papers. Springer Verlag. 2014. p. 83-97. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-319-05167-3_7
Nakamoto, Chihiro ; Huang, Rong ; Koizumi, Sota ; Ishida, Ryosuke ; Feng, Yaokai ; Uchida, Seiichi. / Font distribution observation by network-based analysis. Camera-Based Document Analysis and Recognition - 5th International Workshop, CBDAR 2013, Revised Selected Papers. Springer Verlag, 2014. pp. 83-97 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{a6fc7f55963144c7a1d77fdd36187840,
title = "Font distribution observation by network-based analysis",
abstract = "The off-the-shelf Optical Character Recognition (OCR) engines return mediocre performance on the decorative characters which usually appear in natural scenes such as signboards. A reasonable way towards the so-called camera-based OCR is to collect a large-scale font set and analyze the distribution of font samples for realizing some character recognition engine which is tolerant to font shape variations. This paper is concerned with the issue of font distribution analysis by network. Minimum Spanning Tree (MST) is employed to construct font network with respect to Chamfer distance. After clustering, some centrality criterion, namely closeness centrality, eccentricity centrality or betweenness centrality, is introduced for extracting typical font samples. The network structure allows us to observe the font shape transition between any two samples, which is useful to create new fonts and recognize unseen decorative characters. Moreover, unlike the Principal Component Analysis (PCA), the font network fulfills distribution visualization through measuring the dissimilarity between samples rather than the lossy processing of dimensionality reduction. Compared with K-means algorithm, network-based clustering has the ability to preserve small size font clusters which generally consist of samples taking special appearances. Experiments demonstrate that the proposed network-based analysis is an effective way to grasp font distribution, and thus provides helpful information for decorative character recognition.",
author = "Chihiro Nakamoto and Rong Huang and Sota Koizumi and Ryosuke Ishida and Yaokai Feng and Seiichi Uchida",
year = "2014",
month = "1",
day = "1",
doi = "10.1007/978-3-319-05167-3_7",
language = "English",
isbn = "9783319051666",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer Verlag",
pages = "83--97",
booktitle = "Camera-Based Document Analysis and Recognition - 5th International Workshop, CBDAR 2013, Revised Selected Papers",
address = "Germany",

}

TY - GEN

T1 - Font distribution observation by network-based analysis

AU - Nakamoto, Chihiro

AU - Huang, Rong

AU - Koizumi, Sota

AU - Ishida, Ryosuke

AU - Feng, Yaokai

AU - Uchida, Seiichi

PY - 2014/1/1

Y1 - 2014/1/1

N2 - The off-the-shelf Optical Character Recognition (OCR) engines return mediocre performance on the decorative characters which usually appear in natural scenes such as signboards. A reasonable way towards the so-called camera-based OCR is to collect a large-scale font set and analyze the distribution of font samples for realizing some character recognition engine which is tolerant to font shape variations. This paper is concerned with the issue of font distribution analysis by network. Minimum Spanning Tree (MST) is employed to construct font network with respect to Chamfer distance. After clustering, some centrality criterion, namely closeness centrality, eccentricity centrality or betweenness centrality, is introduced for extracting typical font samples. The network structure allows us to observe the font shape transition between any two samples, which is useful to create new fonts and recognize unseen decorative characters. Moreover, unlike the Principal Component Analysis (PCA), the font network fulfills distribution visualization through measuring the dissimilarity between samples rather than the lossy processing of dimensionality reduction. Compared with K-means algorithm, network-based clustering has the ability to preserve small size font clusters which generally consist of samples taking special appearances. Experiments demonstrate that the proposed network-based analysis is an effective way to grasp font distribution, and thus provides helpful information for decorative character recognition.

AB - The off-the-shelf Optical Character Recognition (OCR) engines return mediocre performance on the decorative characters which usually appear in natural scenes such as signboards. A reasonable way towards the so-called camera-based OCR is to collect a large-scale font set and analyze the distribution of font samples for realizing some character recognition engine which is tolerant to font shape variations. This paper is concerned with the issue of font distribution analysis by network. Minimum Spanning Tree (MST) is employed to construct font network with respect to Chamfer distance. After clustering, some centrality criterion, namely closeness centrality, eccentricity centrality or betweenness centrality, is introduced for extracting typical font samples. The network structure allows us to observe the font shape transition between any two samples, which is useful to create new fonts and recognize unseen decorative characters. Moreover, unlike the Principal Component Analysis (PCA), the font network fulfills distribution visualization through measuring the dissimilarity between samples rather than the lossy processing of dimensionality reduction. Compared with K-means algorithm, network-based clustering has the ability to preserve small size font clusters which generally consist of samples taking special appearances. Experiments demonstrate that the proposed network-based analysis is an effective way to grasp font distribution, and thus provides helpful information for decorative character recognition.

UR - http://www.scopus.com/inward/record.url?scp=84958521489&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84958521489&partnerID=8YFLogxK

U2 - 10.1007/978-3-319-05167-3_7

DO - 10.1007/978-3-319-05167-3_7

M3 - Conference contribution

SN - 9783319051666

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 83

EP - 97

BT - Camera-Based Document Analysis and Recognition - 5th International Workshop, CBDAR 2013, Revised Selected Papers

PB - Springer Verlag

ER -