Extracting the underlying dynamics of objects in image sequences is one of the challenging problems in computer vision. Besides, dynamic mode decomposition (DMD) has recently attracted attention as a method for obtaining modal representations of nonlinear dynamics from general multivariate time-series data without explicit prior information about the dynamics. In this paper, we propose a convolutional autoencoder (CAE)-based DMD (CAE-DMD) to perform accurate modeling of underlying dynamics in videos. We develop a modified CAE model that encodes images to latent vectors and incorporated DMD on the latent vectors to extract DMD modes. These modes are split into background and foreground modes for foreground modeling in videos, or used for video classification tasks. And the latent vectors are mapped so as to recover the input image sequences through a decoder. We perform the network training in an end-to-end manner, i.e., by minimizing the mean square error between the original and reconstructed images. As a result, we obtain accurate extraction of underlying dynamic information in the videos. We empirically investigate the performance of CAE-DMD in two applications background foreground extraction and video classification on synthetic and publicly available datasets.
All Science Journal Classification (ASJC) codes
- Signal Processing
- Computer Vision and Pattern Recognition