TY - GEN
T1 - Lyric video analysis using text detection and tracking
AU - Sakaguchi, Shota
AU - Kato, Jun
AU - Goto, Masataka
AU - Uchida, Seiichi
N1 - Funding Information:
This work was supported by JSPS KAKENHI Grant Number JP17H06100.
PY - 2020
Y1 - 2020
N2 - We attempt to recognize and track lyric words in lyric videos. Lyric video is a music video showing the lyric words of a song. The main characteristic of lyric videos is that the lyric words are shown at frames synchronously with the music. The difficulty of recognizing and tracking the lyric words is that (1) the words are often decorated and geometrically distorted and (2) the words move arbitrarily and drastically in the video frame. The purpose of this paper is to analyze the motion of the lyric words in lyric videos, as the first step of automatic lyric video generation. In order to analyze the motion of lyric words, we first apply a state-of-the-art scene text detector and recognizer to each video frame. Then, lyric-frame matching is performed to establish the optimal correspondence between lyric words and the frames. After fixing the motion trajectories of individual lyric words from correspondence, we analyze the trajectories of the lyric words by k-medoids clustering and dynamic time warping (DTW).
AB - We attempt to recognize and track lyric words in lyric videos. Lyric video is a music video showing the lyric words of a song. The main characteristic of lyric videos is that the lyric words are shown at frames synchronously with the music. The difficulty of recognizing and tracking the lyric words is that (1) the words are often decorated and geometrically distorted and (2) the words move arbitrarily and drastically in the video frame. The purpose of this paper is to analyze the motion of the lyric words in lyric videos, as the first step of automatic lyric video generation. In order to analyze the motion of lyric words, we first apply a state-of-the-art scene text detector and recognizer to each video frame. Then, lyric-frame matching is performed to establish the optimal correspondence between lyric words and the frames. After fixing the motion trajectories of individual lyric words from correspondence, we analyze the trajectories of the lyric words by k-medoids clustering and dynamic time warping (DTW).
UR - http://www.scopus.com/inward/record.url?scp=85090098010&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85090098010&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-57058-3_30
DO - 10.1007/978-3-030-57058-3_30
M3 - Conference contribution
AN - SCOPUS:85090098010
SN - 9783030570576
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 426
EP - 440
BT - Document Analysis Systems - 14th IAPR International Workshop, DAS 2020, Proceedings
A2 - Bai, Xiang
A2 - Karatzas, Dimosthenis
A2 - Lopresti, Daniel
PB - Springer
T2 - 14th IAPR International Workshop on Document Analysis Systems, DAS 2020
Y2 - 26 July 2020 through 29 July 2020
ER -