Enriching multilingual language resources by discovering missing cross-language links in Wikipedia

Jong Hoon Oh, Daisuke Kawahara, Kiyotaka Uchimoto, Jun'ichi Kazama, Kentaro Torisawa

研究成果: Chapter in Book/Report/Conference proceedingConference contribution

16 被引用数 (Scopus)

抄録

We present a novel method for discovering missing crosslanguage links between English and Japanese Wikipedia articles. We collect candidates of missing cross-language links - a pair of English and Japanese Wikipedia articles, which could be connected by cross-language links. Then we select the correct cross-language links among the candidates by using a classifier trained with various types of features. Our method has three desirable characteristics for discovering missing links. First, our method can discover cross-language links with high accuracy (92% precision with 78% recall rates). Second, the features used in a classifier are language-independent. Third, without relying on any external knowledge, we generate the features based on resources automatically obtained from Wikipedia. In this work, we discover approximately 105 missing crosslanguage links from Wikipedia, which are almost two-thirds as many as the existing cross-language links in Wikipedia.

本文言語英語
ホスト出版物のタイトルProceedings - 2008 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2008
ページ322-328
ページ数7
DOI
出版ステータス出版済み - 12 1 2008
イベント2008 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2008 - Sydney, NSW, オーストラリア
継続期間: 12 9 200812 12 2008

出版物シリーズ

名前Proceedings - 2008 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2008

その他

その他2008 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2008
国/地域オーストラリア
CitySydney, NSW
Period12/9/0812/12/08

All Science Journal Classification (ASJC) codes

  • コンピュータ ネットワークおよび通信
  • コンピュータ サイエンスの応用
  • 電子工学および電気工学

フィンガープリント

「Enriching multilingual language resources by discovering missing cross-language links in Wikipedia」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル