TY - GEN
T1 - Coupled dictionary learning and feature mapping for cross-modal retrieval
AU - Xu, Xing
AU - Shimada, Atsushi
AU - Taniguchi, Rin Ichiro
AU - He, Li
N1 - Publisher Copyright:
© 2015 IEEE.
PY - 2015/8/4
Y1 - 2015/8/4
N2 - In this paper, we investigate the problem of modeling images and associated text for cross-modal retrieval tasks such as text-to-image search and image-to-text search. To make the data from image and text modalities comparable, previous cross-modal retrieval methods directly learn two projection matrices to map the raw features of the two modalities into a common subspace, in which cross-modal data matching can be performed. However, the different feature representations and correlation structures of different modalities inhibit these methods from efficiently modeling the relationships across modalities through a common subspace. To handle the diversities of different modalities, we first leverage the coupled dictionary learning method to generate homogeneous sparse representations for different modalities by associating and jointly updating their dictionaries. We then use a coupled feature mapping scheme to project the derived sparse representations from different modalities into a common subspace in which cross-modal retrieval can be performed. Experiments on a variety of cross-modal retrieval tasks demonstrate that the proposed method outperforms the state-of-the-art approaches.
AB - In this paper, we investigate the problem of modeling images and associated text for cross-modal retrieval tasks such as text-to-image search and image-to-text search. To make the data from image and text modalities comparable, previous cross-modal retrieval methods directly learn two projection matrices to map the raw features of the two modalities into a common subspace, in which cross-modal data matching can be performed. However, the different feature representations and correlation structures of different modalities inhibit these methods from efficiently modeling the relationships across modalities through a common subspace. To handle the diversities of different modalities, we first leverage the coupled dictionary learning method to generate homogeneous sparse representations for different modalities by associating and jointly updating their dictionaries. We then use a coupled feature mapping scheme to project the derived sparse representations from different modalities into a common subspace in which cross-modal retrieval can be performed. Experiments on a variety of cross-modal retrieval tasks demonstrate that the proposed method outperforms the state-of-the-art approaches.
UR - http://www.scopus.com/inward/record.url?scp=84946014733&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84946014733&partnerID=8YFLogxK
U2 - 10.1109/ICME.2015.7177396
DO - 10.1109/ICME.2015.7177396
M3 - Conference contribution
AN - SCOPUS:84946014733
T3 - Proceedings - IEEE International Conference on Multimedia and Expo
BT - 2015 IEEE International Conference on Multimedia and Expo, ICME 2015
PB - IEEE Computer Society
T2 - IEEE International Conference on Multimedia and Expo, ICME 2015
Y2 - 29 June 2015 through 3 July 2015
ER -