Mining the displacement of max-pooling for text recognition

Yuchen Zheng, Brian Kenji Iwana, Seiichi Uchida

Research output: Contribution to journalArticle

2 Citations (Scopus)

Abstract

The max-pooling operation in convolutional neural networks (CNNs)downsamples the feature maps of convolutional layers. However, in doing so, it loses some spatial information. In this paper, we extract a novel feature from pooling layers, called displacement features, and combine them with the features resulting from max-pooling to capture the structural deformations for text recognition tasks. The displacement features record the location of the maximal value in a max-pooling operation. Furthermore, we analyze and mine the class-wise trends of the displacement features. The extensive experimental results and discussions demonstrate that the proposed displacement features can improve the performance of the CNN based architectures and tackle the issues with the structural deformations of max-pooling in the text recognition tasks.

Original languageEnglish
Pages (from-to)558-569
Number of pages12
JournalPattern Recognition
Volume93
DOIs
Publication statusPublished - Sep 1 2019

Fingerprint

Neural networks

All Science Journal Classification (ASJC) codes

  • Software
  • Signal Processing
  • Computer Vision and Pattern Recognition
  • Artificial Intelligence

Cite this

Mining the displacement of max-pooling for text recognition. / Zheng, Yuchen; Iwana, Brian Kenji; Uchida, Seiichi.

In: Pattern Recognition, Vol. 93, 01.09.2019, p. 558-569.

Research output: Contribution to journalArticle

Zheng, Yuchen ; Iwana, Brian Kenji ; Uchida, Seiichi. / Mining the displacement of max-pooling for text recognition. In: Pattern Recognition. 2019 ; Vol. 93. pp. 558-569.
@article{59b680464c4349fdb4410dd4b8c2d553,
title = "Mining the displacement of max-pooling for text recognition",
abstract = "The max-pooling operation in convolutional neural networks (CNNs)downsamples the feature maps of convolutional layers. However, in doing so, it loses some spatial information. In this paper, we extract a novel feature from pooling layers, called displacement features, and combine them with the features resulting from max-pooling to capture the structural deformations for text recognition tasks. The displacement features record the location of the maximal value in a max-pooling operation. Furthermore, we analyze and mine the class-wise trends of the displacement features. The extensive experimental results and discussions demonstrate that the proposed displacement features can improve the performance of the CNN based architectures and tackle the issues with the structural deformations of max-pooling in the text recognition tasks.",
author = "Yuchen Zheng and Iwana, {Brian Kenji} and Seiichi Uchida",
year = "2019",
month = "9",
day = "1",
doi = "10.1016/j.patcog.2019.05.014",
language = "English",
volume = "93",
pages = "558--569",
journal = "Pattern Recognition",
issn = "0031-3203",
publisher = "Elsevier Limited",

}

TY - JOUR

T1 - Mining the displacement of max-pooling for text recognition

AU - Zheng, Yuchen

AU - Iwana, Brian Kenji

AU - Uchida, Seiichi

PY - 2019/9/1

Y1 - 2019/9/1

N2 - The max-pooling operation in convolutional neural networks (CNNs)downsamples the feature maps of convolutional layers. However, in doing so, it loses some spatial information. In this paper, we extract a novel feature from pooling layers, called displacement features, and combine them with the features resulting from max-pooling to capture the structural deformations for text recognition tasks. The displacement features record the location of the maximal value in a max-pooling operation. Furthermore, we analyze and mine the class-wise trends of the displacement features. The extensive experimental results and discussions demonstrate that the proposed displacement features can improve the performance of the CNN based architectures and tackle the issues with the structural deformations of max-pooling in the text recognition tasks.

AB - The max-pooling operation in convolutional neural networks (CNNs)downsamples the feature maps of convolutional layers. However, in doing so, it loses some spatial information. In this paper, we extract a novel feature from pooling layers, called displacement features, and combine them with the features resulting from max-pooling to capture the structural deformations for text recognition tasks. The displacement features record the location of the maximal value in a max-pooling operation. Furthermore, we analyze and mine the class-wise trends of the displacement features. The extensive experimental results and discussions demonstrate that the proposed displacement features can improve the performance of the CNN based architectures and tackle the issues with the structural deformations of max-pooling in the text recognition tasks.

UR - http://www.scopus.com/inward/record.url?scp=85065551518&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85065551518&partnerID=8YFLogxK

U2 - 10.1016/j.patcog.2019.05.014

DO - 10.1016/j.patcog.2019.05.014

M3 - Article

AN - SCOPUS:85065551518

VL - 93

SP - 558

EP - 569

JO - Pattern Recognition

JF - Pattern Recognition

SN - 0031-3203

ER -