A new method for multi-oriented graphics-scene-3D text classification in video

Jiamin Xu, Palaiahnakote Shivakumara, Tong Lu, Chew Lim Tan, Seiichi Uchida

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

Text detection and recognition in video is challenging due to the presence of different types of texts, namely, graphics (video caption), scene (natural text), 2D, 3D, static and dynamic texts. Developing a universal method that works well for all the types is hard. In this paper, we propose a novel method for classifying graphics-scene and 2D-3D texts in video to enhance text detection and recognition accuracies. We first propose an iterative method to classify static and dynamic clusters based on the fact that static texts have zero velocity while dynamic texts have non-zero velocity. This results in text candidates for both static and dynamic texts regardless of 2D and 3D types. We then propose symmetry for text candidates using stroke width distances and medial axis values, which results in potential text candidates. We group potential text candidates using their geometrical properties to form text regions. Next, for each text region, we study the distribution of the dominant medial axis values given by ring radius transform in a new way to classify graphics and scene texts. Similarly, we study the proximity among the pixels that satisfy the gradient directions symmetry to classify 2D and 3D texts. We evaluate each step of the proposed method in terms of classification and recognition rates through classification with the existing methods to show that video text classification is effective and necessary for enhancing the capability of current text detection and recognition systems.

Original languageEnglish
Pages (from-to)19-42
Number of pages24
JournalPattern Recognition
Volume49
DOIs
Publication statusPublished - Jan 1 2016

Fingerprint

Iterative methods
Pixels

All Science Journal Classification (ASJC) codes

  • Software
  • Signal Processing
  • Computer Vision and Pattern Recognition
  • Artificial Intelligence

Cite this

A new method for multi-oriented graphics-scene-3D text classification in video. / Xu, Jiamin; Shivakumara, Palaiahnakote; Lu, Tong; Tan, Chew Lim; Uchida, Seiichi.

In: Pattern Recognition, Vol. 49, 01.01.2016, p. 19-42.

Research output: Contribution to journalArticle

Xu, Jiamin ; Shivakumara, Palaiahnakote ; Lu, Tong ; Tan, Chew Lim ; Uchida, Seiichi. / A new method for multi-oriented graphics-scene-3D text classification in video. In: Pattern Recognition. 2016 ; Vol. 49. pp. 19-42.
@article{ff0d51b4546a4017affc4d52c990b1af,
title = "A new method for multi-oriented graphics-scene-3D text classification in video",
abstract = "Text detection and recognition in video is challenging due to the presence of different types of texts, namely, graphics (video caption), scene (natural text), 2D, 3D, static and dynamic texts. Developing a universal method that works well for all the types is hard. In this paper, we propose a novel method for classifying graphics-scene and 2D-3D texts in video to enhance text detection and recognition accuracies. We first propose an iterative method to classify static and dynamic clusters based on the fact that static texts have zero velocity while dynamic texts have non-zero velocity. This results in text candidates for both static and dynamic texts regardless of 2D and 3D types. We then propose symmetry for text candidates using stroke width distances and medial axis values, which results in potential text candidates. We group potential text candidates using their geometrical properties to form text regions. Next, for each text region, we study the distribution of the dominant medial axis values given by ring radius transform in a new way to classify graphics and scene texts. Similarly, we study the proximity among the pixels that satisfy the gradient directions symmetry to classify 2D and 3D texts. We evaluate each step of the proposed method in terms of classification and recognition rates through classification with the existing methods to show that video text classification is effective and necessary for enhancing the capability of current text detection and recognition systems.",
author = "Jiamin Xu and Palaiahnakote Shivakumara and Tong Lu and Tan, {Chew Lim} and Seiichi Uchida",
year = "2016",
month = "1",
day = "1",
doi = "10.1016/j.patcog.2015.07.002",
language = "English",
volume = "49",
pages = "19--42",
journal = "Pattern Recognition",
issn = "0031-3203",
publisher = "Elsevier Limited",

}

TY - JOUR

T1 - A new method for multi-oriented graphics-scene-3D text classification in video

AU - Xu, Jiamin

AU - Shivakumara, Palaiahnakote

AU - Lu, Tong

AU - Tan, Chew Lim

AU - Uchida, Seiichi

PY - 2016/1/1

Y1 - 2016/1/1

N2 - Text detection and recognition in video is challenging due to the presence of different types of texts, namely, graphics (video caption), scene (natural text), 2D, 3D, static and dynamic texts. Developing a universal method that works well for all the types is hard. In this paper, we propose a novel method for classifying graphics-scene and 2D-3D texts in video to enhance text detection and recognition accuracies. We first propose an iterative method to classify static and dynamic clusters based on the fact that static texts have zero velocity while dynamic texts have non-zero velocity. This results in text candidates for both static and dynamic texts regardless of 2D and 3D types. We then propose symmetry for text candidates using stroke width distances and medial axis values, which results in potential text candidates. We group potential text candidates using their geometrical properties to form text regions. Next, for each text region, we study the distribution of the dominant medial axis values given by ring radius transform in a new way to classify graphics and scene texts. Similarly, we study the proximity among the pixels that satisfy the gradient directions symmetry to classify 2D and 3D texts. We evaluate each step of the proposed method in terms of classification and recognition rates through classification with the existing methods to show that video text classification is effective and necessary for enhancing the capability of current text detection and recognition systems.

AB - Text detection and recognition in video is challenging due to the presence of different types of texts, namely, graphics (video caption), scene (natural text), 2D, 3D, static and dynamic texts. Developing a universal method that works well for all the types is hard. In this paper, we propose a novel method for classifying graphics-scene and 2D-3D texts in video to enhance text detection and recognition accuracies. We first propose an iterative method to classify static and dynamic clusters based on the fact that static texts have zero velocity while dynamic texts have non-zero velocity. This results in text candidates for both static and dynamic texts regardless of 2D and 3D types. We then propose symmetry for text candidates using stroke width distances and medial axis values, which results in potential text candidates. We group potential text candidates using their geometrical properties to form text regions. Next, for each text region, we study the distribution of the dominant medial axis values given by ring radius transform in a new way to classify graphics and scene texts. Similarly, we study the proximity among the pixels that satisfy the gradient directions symmetry to classify 2D and 3D texts. We evaluate each step of the proposed method in terms of classification and recognition rates through classification with the existing methods to show that video text classification is effective and necessary for enhancing the capability of current text detection and recognition systems.

UR - http://www.scopus.com/inward/record.url?scp=84942830069&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84942830069&partnerID=8YFLogxK

U2 - 10.1016/j.patcog.2015.07.002

DO - 10.1016/j.patcog.2015.07.002

M3 - Article

AN - SCOPUS:84942830069

VL - 49

SP - 19

EP - 42

JO - Pattern Recognition

JF - Pattern Recognition

SN - 0031-3203

ER -