TY - GEN
T1 - Semi-supervised coupled dictionary learning for cross-modal retrieval in internet images and texts
AU - Xu, Xing
AU - Yang, Yang
AU - Shimada, Atsushi
AU - Taniguchi, Rin Ichiro
AU - He, Li
N1 - Publisher Copyright:
© 2015 ACM.
PY - 2015/10/13
Y1 - 2015/10/13
N2 - Nowadays massive amount of images and texts has been emerging on the Internet, arousing the demand of effective cross-modal retrieval. To eliminate the heterogeneity be-tween the modalities of images and texts, the existing sub-space learning methods try to learn a common latent sub-space under which cross-modal matching can be performed. However, these methods usually require fully paired sam-ples (images with corresponding texts) and also ignore the class label information along with the paired samples. In-deed, the class label information can reduce the semantic gap between different modalities and explicitly guide the subspace learning procedure. In addition, the large quan-tities of unpaired samples (images or texts) may provide useful side information to enrich the representations from learned subspace. Thus, in this paper we propose a novel model for cross-modal retrieval problem. It consists of 1) a semi-supervised coupled dictionary learning step to generate homogeneously sparse representations for different modali-ties based on both paired and unpaired samples; 2) a coupled feature mapping step to project the sparse representations of different modalities into a common subspace defined by class label information to perform cross-modal matching. Exper-iments on a large scale web image dataset MIRFlickr-1M with both fully paired and unpaired settings show the effec-tiveness of the proposed model on the cross-modal retrieval task.
AB - Nowadays massive amount of images and texts has been emerging on the Internet, arousing the demand of effective cross-modal retrieval. To eliminate the heterogeneity be-tween the modalities of images and texts, the existing sub-space learning methods try to learn a common latent sub-space under which cross-modal matching can be performed. However, these methods usually require fully paired sam-ples (images with corresponding texts) and also ignore the class label information along with the paired samples. In-deed, the class label information can reduce the semantic gap between different modalities and explicitly guide the subspace learning procedure. In addition, the large quan-tities of unpaired samples (images or texts) may provide useful side information to enrich the representations from learned subspace. Thus, in this paper we propose a novel model for cross-modal retrieval problem. It consists of 1) a semi-supervised coupled dictionary learning step to generate homogeneously sparse representations for different modali-ties based on both paired and unpaired samples; 2) a coupled feature mapping step to project the sparse representations of different modalities into a common subspace defined by class label information to perform cross-modal matching. Exper-iments on a large scale web image dataset MIRFlickr-1M with both fully paired and unpaired settings show the effec-tiveness of the proposed model on the cross-modal retrieval task.
UR - http://www.scopus.com/inward/record.url?scp=84962821124&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84962821124&partnerID=8YFLogxK
U2 - 10.1145/2733373.2806346
DO - 10.1145/2733373.2806346
M3 - Conference contribution
AN - SCOPUS:84962821124
T3 - MM 2015 - Proceedings of the 2015 ACM Multimedia Conference
SP - 847
EP - 850
BT - MM 2015 - Proceedings of the 2015 ACM Multimedia Conference
PB - Association for Computing Machinery, Inc
T2 - 23rd ACM International Conference on Multimedia, MM 2015
Y2 - 26 October 2015 through 30 October 2015
ER -