TY - GEN
T1 - Font distribution observation by network-based analysis
AU - Nakamoto, Chihiro
AU - Huang, Rong
AU - Koizumi, Sota
AU - Ishida, Ryosuke
AU - Feng, Yaokai
AU - Uchida, Seiichi
PY - 2014/1/1
Y1 - 2014/1/1
N2 - The off-the-shelf Optical Character Recognition (OCR) engines return mediocre performance on the decorative characters which usually appear in natural scenes such as signboards. A reasonable way towards the so-called camera-based OCR is to collect a large-scale font set and analyze the distribution of font samples for realizing some character recognition engine which is tolerant to font shape variations. This paper is concerned with the issue of font distribution analysis by network. Minimum Spanning Tree (MST) is employed to construct font network with respect to Chamfer distance. After clustering, some centrality criterion, namely closeness centrality, eccentricity centrality or betweenness centrality, is introduced for extracting typical font samples. The network structure allows us to observe the font shape transition between any two samples, which is useful to create new fonts and recognize unseen decorative characters. Moreover, unlike the Principal Component Analysis (PCA), the font network fulfills distribution visualization through measuring the dissimilarity between samples rather than the lossy processing of dimensionality reduction. Compared with K-means algorithm, network-based clustering has the ability to preserve small size font clusters which generally consist of samples taking special appearances. Experiments demonstrate that the proposed network-based analysis is an effective way to grasp font distribution, and thus provides helpful information for decorative character recognition.
AB - The off-the-shelf Optical Character Recognition (OCR) engines return mediocre performance on the decorative characters which usually appear in natural scenes such as signboards. A reasonable way towards the so-called camera-based OCR is to collect a large-scale font set and analyze the distribution of font samples for realizing some character recognition engine which is tolerant to font shape variations. This paper is concerned with the issue of font distribution analysis by network. Minimum Spanning Tree (MST) is employed to construct font network with respect to Chamfer distance. After clustering, some centrality criterion, namely closeness centrality, eccentricity centrality or betweenness centrality, is introduced for extracting typical font samples. The network structure allows us to observe the font shape transition between any two samples, which is useful to create new fonts and recognize unseen decorative characters. Moreover, unlike the Principal Component Analysis (PCA), the font network fulfills distribution visualization through measuring the dissimilarity between samples rather than the lossy processing of dimensionality reduction. Compared with K-means algorithm, network-based clustering has the ability to preserve small size font clusters which generally consist of samples taking special appearances. Experiments demonstrate that the proposed network-based analysis is an effective way to grasp font distribution, and thus provides helpful information for decorative character recognition.
UR - http://www.scopus.com/inward/record.url?scp=84958521489&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84958521489&partnerID=8YFLogxK
U2 - 10.1007/978-3-319-05167-3_7
DO - 10.1007/978-3-319-05167-3_7
M3 - Conference contribution
AN - SCOPUS:84958521489
SN - 9783319051666
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 83
EP - 97
BT - Camera-Based Document Analysis and Recognition - 5th International Workshop, CBDAR 2013, Revised Selected Papers
PB - Springer Verlag
T2 - 5th International Workshop on Camera-Based Document Analysis and Recognition, CBDAR 2013
Y2 - 23 August 2013 through 23 August 2013
ER -