### 抄録

Nowadays the amount of multimedia data such as images and text is growing exponentially on social websites, arousing the demand of effective and efficient cross-modal retrieval. The cross-modal hashing based methods have attracted considerable attention recently as they can learn efficient binary codes for heterogeneous data, which enables large-scale similarity search. Generally, to effectively construct the cross-correlation between different modalities, these methods try to find a joint abstraction space where the heterogeneous data can be projected. Then a quantization rule is applied to convert the abstraction representation to binary codes. However, these methods may not effectively bridge the semantic gap through the latent abstraction space because they fail to capture latent information between heterogeneous data. In addition, most of these methods apply the simplest quantization scheme (i.e. sign function) which may cause information loss of the abstraction representation and result in inferior binary codes. To address these challenges, in this paper, we present a novel cross-modal hashing based method that generates unified binary codes combining different modalities. Specifically, we first extract semantic features from the modalities of images and text to capture latent information. Then these semantic features are projected to a joint abstraction space. Finally, the abstraction space is rotated to produce better unified binary codes with much less quantization loss, while preserving the locality structure of projected data. We integrate the binary code learning procedures above to develop an iterative algorithm for optimal solutions. Moreover, we further exploit the useful class label information to reduce the semantic gap between different modalities to benefit the binary code learning. Extensive experiments on four multimedia datasets show that the proposed binary coding schemes outperform several other state-of-the-art methods under cross-modal scenarios.

元の言語 | 英語 |
---|---|

ページ（範囲） | 191-203 |

ページ数 | 13 |

ジャーナル | Neurocomputing |

巻 | 213 |

DOI | |

出版物ステータス | 出版済み - 11 12 2016 |

### Fingerprint

### All Science Journal Classification (ASJC) codes

- Computer Science Applications
- Cognitive Neuroscience
- Artificial Intelligence

### これを引用

*Neurocomputing*,

*213*, 191-203. https://doi.org/10.1016/j.neucom.2015.11.133

**Learning unified binary codes for cross-modal retrieval via latent semantic hashing.** / Xu, Xing; He, Li; Shimada, Atsushi; Taniguchi, Rin-Ichiro; Lu, Huimin.

研究成果: ジャーナルへの寄稿 › 記事

*Neurocomputing*, 巻. 213, pp. 191-203. https://doi.org/10.1016/j.neucom.2015.11.133

}

TY - JOUR

T1 - Learning unified binary codes for cross-modal retrieval via latent semantic hashing

AU - Xu, Xing

AU - He, Li

AU - Shimada, Atsushi

AU - Taniguchi, Rin-Ichiro

AU - Lu, Huimin

PY - 2016/11/12

Y1 - 2016/11/12

N2 - Nowadays the amount of multimedia data such as images and text is growing exponentially on social websites, arousing the demand of effective and efficient cross-modal retrieval. The cross-modal hashing based methods have attracted considerable attention recently as they can learn efficient binary codes for heterogeneous data, which enables large-scale similarity search. Generally, to effectively construct the cross-correlation between different modalities, these methods try to find a joint abstraction space where the heterogeneous data can be projected. Then a quantization rule is applied to convert the abstraction representation to binary codes. However, these methods may not effectively bridge the semantic gap through the latent abstraction space because they fail to capture latent information between heterogeneous data. In addition, most of these methods apply the simplest quantization scheme (i.e. sign function) which may cause information loss of the abstraction representation and result in inferior binary codes. To address these challenges, in this paper, we present a novel cross-modal hashing based method that generates unified binary codes combining different modalities. Specifically, we first extract semantic features from the modalities of images and text to capture latent information. Then these semantic features are projected to a joint abstraction space. Finally, the abstraction space is rotated to produce better unified binary codes with much less quantization loss, while preserving the locality structure of projected data. We integrate the binary code learning procedures above to develop an iterative algorithm for optimal solutions. Moreover, we further exploit the useful class label information to reduce the semantic gap between different modalities to benefit the binary code learning. Extensive experiments on four multimedia datasets show that the proposed binary coding schemes outperform several other state-of-the-art methods under cross-modal scenarios.

AB - Nowadays the amount of multimedia data such as images and text is growing exponentially on social websites, arousing the demand of effective and efficient cross-modal retrieval. The cross-modal hashing based methods have attracted considerable attention recently as they can learn efficient binary codes for heterogeneous data, which enables large-scale similarity search. Generally, to effectively construct the cross-correlation between different modalities, these methods try to find a joint abstraction space where the heterogeneous data can be projected. Then a quantization rule is applied to convert the abstraction representation to binary codes. However, these methods may not effectively bridge the semantic gap through the latent abstraction space because they fail to capture latent information between heterogeneous data. In addition, most of these methods apply the simplest quantization scheme (i.e. sign function) which may cause information loss of the abstraction representation and result in inferior binary codes. To address these challenges, in this paper, we present a novel cross-modal hashing based method that generates unified binary codes combining different modalities. Specifically, we first extract semantic features from the modalities of images and text to capture latent information. Then these semantic features are projected to a joint abstraction space. Finally, the abstraction space is rotated to produce better unified binary codes with much less quantization loss, while preserving the locality structure of projected data. We integrate the binary code learning procedures above to develop an iterative algorithm for optimal solutions. Moreover, we further exploit the useful class label information to reduce the semantic gap between different modalities to benefit the binary code learning. Extensive experiments on four multimedia datasets show that the proposed binary coding schemes outperform several other state-of-the-art methods under cross-modal scenarios.

UR - http://www.scopus.com/inward/record.url?scp=84994844574&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84994844574&partnerID=8YFLogxK

U2 - 10.1016/j.neucom.2015.11.133

DO - 10.1016/j.neucom.2015.11.133

M3 - Article

VL - 213

SP - 191

EP - 203

JO - Neurocomputing

JF - Neurocomputing

SN - 0925-2312

ER -