Mining the displacement of max-pooling for text recognition

研究成果: Contribution to journalArticle査読

24 被引用数 (Scopus)

抄録

The max-pooling operation in convolutional neural networks (CNNs)downsamples the feature maps of convolutional layers. However, in doing so, it loses some spatial information. In this paper, we extract a novel feature from pooling layers, called displacement features, and combine them with the features resulting from max-pooling to capture the structural deformations for text recognition tasks. The displacement features record the location of the maximal value in a max-pooling operation. Furthermore, we analyze and mine the class-wise trends of the displacement features. The extensive experimental results and discussions demonstrate that the proposed displacement features can improve the performance of the CNN based architectures and tackle the issues with the structural deformations of max-pooling in the text recognition tasks.

本文言語英語
ページ(範囲)558-569
ページ数12
ジャーナルPattern Recognition
93
DOI
出版ステータス出版済み - 9 2019

All Science Journal Classification (ASJC) codes

  • ソフトウェア
  • 信号処理
  • コンピュータ ビジョンおよびパターン認識
  • 人工知能

フィンガープリント

「Mining the displacement of max-pooling for text recognition」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル