TY - GEN
T1 - Page segmentation using a convolutional neural network with trainable co-occurrence features
AU - Lee, Joonho
AU - Hayashi, Hideaki
AU - Ohyama, Wataru
AU - Uchida, Seiichi
N1 - Funding Information:
This work was supported by JSPS KAKENHI Grant Number JP17H06100.
Publisher Copyright:
© 2019 IEEE.
PY - 2019/9
Y1 - 2019/9
N2 - In document analysis, page segmentation is a fundamental task that divides a document image into semantic regions. In addition to local features, such as pixel-wise information, co-occurrence features are also useful for extracting texture-like periodic information for accurate segmentation. However, existing convolutional neural network (CNN)-based methods do not have any mechanisms that explicitly extract co-occurrence features. In this paper, we propose a method for page segmentation using a CNN with trainable multiplication layers (TMLs). The TML is specialized for extracting co-occurrences from feature maps, thereby supporting the detection of objects with similar textures and periodicities. This property is also considered to be effective for document image analysis because of regularity in text line structures, tables, etc. In the experiment, we achieved promising performance on a pixel-wise page segmentation task by combining TMLs with U-Net. The results demonstrate that TMLs can improve performance compared to the original U-Net. The results also demonstrate that TMLs are helpful for detecting regions with periodically repeating features, such as tables and main text.
AB - In document analysis, page segmentation is a fundamental task that divides a document image into semantic regions. In addition to local features, such as pixel-wise information, co-occurrence features are also useful for extracting texture-like periodic information for accurate segmentation. However, existing convolutional neural network (CNN)-based methods do not have any mechanisms that explicitly extract co-occurrence features. In this paper, we propose a method for page segmentation using a CNN with trainable multiplication layers (TMLs). The TML is specialized for extracting co-occurrences from feature maps, thereby supporting the detection of objects with similar textures and periodicities. This property is also considered to be effective for document image analysis because of regularity in text line structures, tables, etc. In the experiment, we achieved promising performance on a pixel-wise page segmentation task by combining TMLs with U-Net. The results demonstrate that TMLs can improve performance compared to the original U-Net. The results also demonstrate that TMLs are helpful for detecting regions with periodically repeating features, such as tables and main text.
UR - http://www.scopus.com/inward/record.url?scp=85079877301&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85079877301&partnerID=8YFLogxK
U2 - 10.1109/ICDAR.2019.00167
DO - 10.1109/ICDAR.2019.00167
M3 - Conference contribution
AN - SCOPUS:85079877301
T3 - Proceedings of the International Conference on Document Analysis and Recognition, ICDAR
SP - 1023
EP - 1028
BT - Proceedings - 15th IAPR International Conference on Document Analysis and Recognition, ICDAR 2019
PB - IEEE Computer Society
T2 - 15th IAPR International Conference on Document Analysis and Recognition, ICDAR 2019
Y2 - 20 September 2019 through 25 September 2019
ER -