TY - JOUR
T1 - A new method for multi-oriented graphics-scene-3D text classification in video
AU - Xu, Jiamin
AU - Shivakumara, Palaiahnakote
AU - Lu, Tong
AU - Tan, Chew Lim
AU - Uchida, Seiichi
N1 - Funding Information:
The work described in this paper was supported by the Natural Science Foundation of China under Grant nos. 61272218 and 61321491 , and the Program for New Century Excellent Talents under NCET-11-0232 . The work was also partly supported by the University of Malaya HIR under Grant no. UM.C/625/1/HIR/MOHE/ENG/42 . Jiamin Xu is now a research graduate student at the Department of Computer Science and Technology, Nanjing University. His current interests are in the areas of document analysis, computer vision and pattern recognition algorithms. Palaiahnakote Shivakumara is a Visiting Senior Lecturer in the Department of Computer Systems and Information Technology, Faculty of Computer Science and Information Technology, University of Malaya. He received B.Sc., M.Sc., M.Sc. Technology by research and Ph.D. degrees in computer science respectively in 1995, 1999, 2001 and 2005 from University of Mysore, Mysore, Karnataka, India. From 1999 to 2005, he was Project Associate in the Department of Studies in Computer Science, University of Mysore, where he conducted research on document image analysis, including document image mosaicing, character recognition, skew detection, face detection and face recognition. He worked as a Research Fellow in the field of image processing and multimedia in the Department of Computer Science, School of Computing, National University of Singapore, from 2005 to 2007. He also worked as a Research Consultant in Nanyang Technological University, Singapore for a period of 6 months on image classification in 2007. He worked as a Research Fellow (RF) in National University of Singapore (NUS) from 2008 to 2013 on video text extraction and recognition. He has published more than 130 research papers in national, international conferences and journals. He has been reviewer for several conferences and journals. His research interests are in the area of image processing, pattern recognition, including text extraction from video and document image processing. Tong Lu received the Ph.D. degree in computer science from Nanjing University in 2005. He received his M.Sc. and B.Sc. degree from the same university in 2002 and 1993, respectively. He served as Associate Professor and Assistant Professor in the Department of Computer Science and Technology at Nanjing University from 2007 and 2005. He is now a full Professor at the same university. He also has served as Visiting Scholar at National University of Singapore and Department of Computer Science and Engineering, Hong Kong University of Science and Technology, respectively. He is also a member of the National Key Laboratory of Novel Software Technology in China. He has published over 60 papers and authored 2 books in his area of interest, and issued more than 20 international or Chinese invention patents. His current interests are in the areas of multimedia, computer vision and pattern recognition algorithms/systems. Tong Lu was a member of ACM, IAPR, ISAI and a senior member of China Computer Federation (CCF). He is the Youth Associate Editor of Journal on Frontiers of Computer Science (FCS), and has served as the Secretary-general of CAD&CG Committee of Jiangsu Computer Federation in China since 2008. He has been member of the program committee or session chair of more than 10 international scientific conferences, and the Chair of Organization Committee of Youth Scholar Forum of State Key Laboratory for Novel Software Technology since 2010. Chew Lim Tan received the B.Sc. (hons.) degree in physics from the University of Singapore, Singapore, in 1971, the M.Sc. degree in radiation studies from the University of Surrey, Surrey, U.K., in 1973, and the Ph.D. degree in computer science from the University of Virginia, Charlottesville, in 1986. He is currently a Professor with the Department of Computer Science, School of Computing, National University of Singapore, Singapore. His current research interests include document image analysis, text and natural language processing, neural networks, and genetic programming. He has published more than 360 research publications in these areas. Dr. Tan is an Associate Editor of Pattern Recognition and the ACM Transactions on Asian Language Information Processing, and is an Editorial Member of the International Journal on Document Analysis and Recognition. He is a member of the Governing Board of the International Association of Pattern Recognition. Seiichi Uchida received B.E. and M.E. and Dr. Eng. degrees from Kyushu University in 1990, 1992 and 1999, respectively. From 1992 to 1996, he joined SECOM Co., Ltd., Japan. Currently, he is a professor at Kyushu University. His research interests include pattern recognition and image processing. He received 2002 IEICE PRMU Research Encouraging Award, 2008 IEICE Best Paper Award, MIRU2006 Nagao Award (best paper award), MIRU2011 Excellent Paper Award, and 2010 ICFHR Best Paper Award. Dr. Uchida is a member of IEEE and IPSJ.
PY - 2016/1/1
Y1 - 2016/1/1
N2 - Text detection and recognition in video is challenging due to the presence of different types of texts, namely, graphics (video caption), scene (natural text), 2D, 3D, static and dynamic texts. Developing a universal method that works well for all the types is hard. In this paper, we propose a novel method for classifying graphics-scene and 2D-3D texts in video to enhance text detection and recognition accuracies. We first propose an iterative method to classify static and dynamic clusters based on the fact that static texts have zero velocity while dynamic texts have non-zero velocity. This results in text candidates for both static and dynamic texts regardless of 2D and 3D types. We then propose symmetry for text candidates using stroke width distances and medial axis values, which results in potential text candidates. We group potential text candidates using their geometrical properties to form text regions. Next, for each text region, we study the distribution of the dominant medial axis values given by ring radius transform in a new way to classify graphics and scene texts. Similarly, we study the proximity among the pixels that satisfy the gradient directions symmetry to classify 2D and 3D texts. We evaluate each step of the proposed method in terms of classification and recognition rates through classification with the existing methods to show that video text classification is effective and necessary for enhancing the capability of current text detection and recognition systems.
AB - Text detection and recognition in video is challenging due to the presence of different types of texts, namely, graphics (video caption), scene (natural text), 2D, 3D, static and dynamic texts. Developing a universal method that works well for all the types is hard. In this paper, we propose a novel method for classifying graphics-scene and 2D-3D texts in video to enhance text detection and recognition accuracies. We first propose an iterative method to classify static and dynamic clusters based on the fact that static texts have zero velocity while dynamic texts have non-zero velocity. This results in text candidates for both static and dynamic texts regardless of 2D and 3D types. We then propose symmetry for text candidates using stroke width distances and medial axis values, which results in potential text candidates. We group potential text candidates using their geometrical properties to form text regions. Next, for each text region, we study the distribution of the dominant medial axis values given by ring radius transform in a new way to classify graphics and scene texts. Similarly, we study the proximity among the pixels that satisfy the gradient directions symmetry to classify 2D and 3D texts. We evaluate each step of the proposed method in terms of classification and recognition rates through classification with the existing methods to show that video text classification is effective and necessary for enhancing the capability of current text detection and recognition systems.
UR - http://www.scopus.com/inward/record.url?scp=84942830069&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84942830069&partnerID=8YFLogxK
U2 - 10.1016/j.patcog.2015.07.002
DO - 10.1016/j.patcog.2015.07.002
M3 - Article
AN - SCOPUS:84942830069
SN - 0031-3203
VL - 49
SP - 19
EP - 42
JO - Pattern Recognition
JF - Pattern Recognition
ER -