抄録
Distributed word representations are usedin many natural language processing tasks.When dealing with ambiguous words, it is desired to generate multi-sense embeddings, i.e.,multiple representations per word. Therefore,several methods have been proposed to generate different word representations based onparts of speech or topic, but these methodstend to be too coarse to deal with ambiguity.In this paper, we propose methods to generatemultiple word representations for each wordbased on dependency structure relations. Inorder to deal with the data sparseness problem due to the increase in the size of vocabulary, the initial value for each word representations is determined using pre-trained wordrepresentations. It is expected that the representations of low frequency words will remainin the vicinity of the initial value, which will inturn reduce the negative effects of data sparseness. Extensive evaluation results confirmthe effectiveness of our methods that significantly outperformed state-of-the-art methodsfor multi-sense embeddings. Detailed analysisof our method shows that the data sparsenessproblem is resolved due to the pre-training.
本文言語 | 英語 |
---|---|
ページ | 28-36 |
ページ数 | 9 |
出版ステータス | 出版済み - 2018 |
イベント | 32nd Pacific Asia Conference on Language, Information and Computation, PACLIC 2018 - Hong Kong, 香港 継続期間: 12 1 2018 → 12 3 2018 |
会議
会議 | 32nd Pacific Asia Conference on Language, Information and Computation, PACLIC 2018 |
---|---|
Country | 香港 |
City | Hong Kong |
Period | 12/1/18 → 12/3/18 |
All Science Journal Classification (ASJC) codes
- Language and Linguistics
- Computer Science (miscellaneous)