Mining the displacement of max-pooling for text recognition

Research output: Contribution to journalArticlepeer-review

10 Citations (Scopus)

Abstract

The max-pooling operation in convolutional neural networks (CNNs)downsamples the feature maps of convolutional layers. However, in doing so, it loses some spatial information. In this paper, we extract a novel feature from pooling layers, called displacement features, and combine them with the features resulting from max-pooling to capture the structural deformations for text recognition tasks. The displacement features record the location of the maximal value in a max-pooling operation. Furthermore, we analyze and mine the class-wise trends of the displacement features. The extensive experimental results and discussions demonstrate that the proposed displacement features can improve the performance of the CNN based architectures and tackle the issues with the structural deformations of max-pooling in the text recognition tasks.

Original languageEnglish
Pages (from-to)558-569
Number of pages12
JournalPattern Recognition
Volume93
DOIs
Publication statusPublished - Sep 2019

All Science Journal Classification (ASJC) codes

  • Software
  • Signal Processing
  • Computer Vision and Pattern Recognition
  • Artificial Intelligence

Fingerprint Dive into the research topics of 'Mining the displacement of max-pooling for text recognition'. Together they form a unique fingerprint.

Cite this