TY - GEN
T1 - Multi-Frame Attention with Feature-Level Warping for Drone Crowd Tracking
AU - Asanomi, Takanori
AU - Nishimura, Kazuya
AU - Bise, Ryoma
N1 - Funding Information:
This paper proposed a point-level multiple object tracking method that can track small human heads from a video captured by a drone. This method can align the object features in the map by feature-level warping and aggregate the image features from multiple frames by multi-frame attention. This makes the method able to use multi-frame context effectively. Experiments demonstrated that our method could effectively use multi-frame context and outperformed the state-of-the-art method on the DroneCrowd dataset. Acknowledgment: This work was supported by JSPS KAKENHI Grant Number JP21K19829.
Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - Drone crowd tracking has various applications such as crowd management and video surveillance. Unlike in general multi-object tracking, the size of the objects to be tracked are small, and the ground truth is given by a point-level annotation, which has no region information. This causes the lack of discriminative features for finding the same objects from many similar objects. Thus, similarity-based tracking techniques, which are widely used for multi-object tracking with bounding-box, are difficult to use. To deal with this problem, we take into account the temporal context of the local area. To aggregate temporal context in a local area, we propose a multi-frame attention with feature-level warping. The feature-level warping can align the features of the same object in multiple frames, and then multi-frame attention can effectively aggregate the temporal context from the warped features. The experimental results show the effectiveness of our method. Our method outperformed the state-of-the-art method in DroneCrowd dataset. The code is publicly available in https://github.com/asanomitakanori/mfa-feature-warping.
AB - Drone crowd tracking has various applications such as crowd management and video surveillance. Unlike in general multi-object tracking, the size of the objects to be tracked are small, and the ground truth is given by a point-level annotation, which has no region information. This causes the lack of discriminative features for finding the same objects from many similar objects. Thus, similarity-based tracking techniques, which are widely used for multi-object tracking with bounding-box, are difficult to use. To deal with this problem, we take into account the temporal context of the local area. To aggregate temporal context in a local area, we propose a multi-frame attention with feature-level warping. The feature-level warping can align the features of the same object in multiple frames, and then multi-frame attention can effectively aggregate the temporal context from the warped features. The experimental results show the effectiveness of our method. Our method outperformed the state-of-the-art method in DroneCrowd dataset. The code is publicly available in https://github.com/asanomitakanori/mfa-feature-warping.
UR - http://www.scopus.com/inward/record.url?scp=85149044671&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85149044671&partnerID=8YFLogxK
U2 - 10.1109/WACV56688.2023.00171
DO - 10.1109/WACV56688.2023.00171
M3 - Conference contribution
AN - SCOPUS:85149044671
T3 - Proceedings - 2023 IEEE Winter Conference on Applications of Computer Vision, WACV 2023
SP - 1664
EP - 1673
BT - Proceedings - 2023 IEEE Winter Conference on Applications of Computer Vision, WACV 2023
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 23rd IEEE/CVF Winter Conference on Applications of Computer Vision, WACV 2023
Y2 - 3 January 2023 through 7 January 2023
ER -