TY - JOUR
T1 - Dynamic mode decomposition via convolutional autoencoders for dynamics modeling in videos
AU - Ul Haq, Israr
AU - Iwata, Tomoharu
AU - Kawahara, Yoshinobu
N1 - Funding Information:
This work was supported by AMED (Grant Number JP19dm0307009) , JSPS KAKENHI (Grant Number JP18H03287) , and JST CREST (Grant Number JPMJCR1913 ).
Publisher Copyright:
© 2021 The Author(s)
PY - 2022/2
Y1 - 2022/2
N2 - Extracting the underlying dynamics of objects in image sequences is one of the challenging problems in computer vision. Besides, dynamic mode decomposition (DMD) has recently attracted attention as a method for obtaining modal representations of nonlinear dynamics from general multivariate time-series data without explicit prior information about the dynamics. In this paper, we propose a convolutional autoencoder (CAE)-based DMD (CAE-DMD) to perform accurate modeling of underlying dynamics in videos. We develop a modified CAE model that encodes images to latent vectors and incorporated DMD on the latent vectors to extract DMD modes. These modes are split into background and foreground modes for foreground modeling in videos, or used for video classification tasks. And the latent vectors are mapped so as to recover the input image sequences through a decoder. We perform the network training in an end-to-end manner, i.e., by minimizing the mean square error between the original and reconstructed images. As a result, we obtain accurate extraction of underlying dynamic information in the videos. We empirically investigate the performance of CAE-DMD in two applications background foreground extraction and video classification on synthetic and publicly available datasets.
AB - Extracting the underlying dynamics of objects in image sequences is one of the challenging problems in computer vision. Besides, dynamic mode decomposition (DMD) has recently attracted attention as a method for obtaining modal representations of nonlinear dynamics from general multivariate time-series data without explicit prior information about the dynamics. In this paper, we propose a convolutional autoencoder (CAE)-based DMD (CAE-DMD) to perform accurate modeling of underlying dynamics in videos. We develop a modified CAE model that encodes images to latent vectors and incorporated DMD on the latent vectors to extract DMD modes. These modes are split into background and foreground modes for foreground modeling in videos, or used for video classification tasks. And the latent vectors are mapped so as to recover the input image sequences through a decoder. We perform the network training in an end-to-end manner, i.e., by minimizing the mean square error between the original and reconstructed images. As a result, we obtain accurate extraction of underlying dynamic information in the videos. We empirically investigate the performance of CAE-DMD in two applications background foreground extraction and video classification on synthetic and publicly available datasets.
UR - http://www.scopus.com/inward/record.url?scp=85123033639&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85123033639&partnerID=8YFLogxK
U2 - 10.1016/j.cviu.2021.103355
DO - 10.1016/j.cviu.2021.103355
M3 - Article
AN - SCOPUS:85123033639
VL - 216
JO - Computer Vision and Image Understanding
JF - Computer Vision and Image Understanding
SN - 1077-3142
M1 - 103355
ER -