Enriching multilingual language resources by discovering missing cross-language links in Wikipedia

Jong Hoon Oh, Daisuke Kawahara, Kiyotaka Uchimoto, Jun'ichi Kazama, Kentaro Torisawa

Research output: Chapter in Book/Report/Conference proceedingConference contribution

13 Citations (Scopus)

Abstract

We present a novel method for discovering missing crosslanguage links between English and Japanese Wikipedia articles. We collect candidates of missing cross-language links - a pair of English and Japanese Wikipedia articles, which could be connected by cross-language links. Then we select the correct cross-language links among the candidates by using a classifier trained with various types of features. Our method has three desirable characteristics for discovering missing links. First, our method can discover cross-language links with high accuracy (92% precision with 78% recall rates). Second, the features used in a classifier are language-independent. Third, without relying on any external knowledge, we generate the features based on resources automatically obtained from Wikipedia. In this work, we discover approximately 105 missing crosslanguage links from Wikipedia, which are almost two-thirds as many as the existing cross-language links in Wikipedia.

Original languageEnglish
Title of host publicationProceedings - 2008 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2008
Pages322-328
Number of pages7
DOIs
Publication statusPublished - Dec 1 2008
Event2008 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2008 - Sydney, NSW, Australia
Duration: Dec 9 2008Dec 12 2008

Publication series

NameProceedings - 2008 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2008

Other

Other2008 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2008
CountryAustralia
CitySydney, NSW
Period12/9/0812/12/08

Fingerprint

Classifiers

All Science Journal Classification (ASJC) codes

  • Computer Networks and Communications
  • Computer Science Applications
  • Electrical and Electronic Engineering

Cite this

Oh, J. H., Kawahara, D., Uchimoto, K., Kazama, J., & Torisawa, K. (2008). Enriching multilingual language resources by discovering missing cross-language links in Wikipedia. In Proceedings - 2008 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2008 (pp. 322-328). [4740467] (Proceedings - 2008 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2008). https://doi.org/10.1109/WIIAT.2008.317

Enriching multilingual language resources by discovering missing cross-language links in Wikipedia. / Oh, Jong Hoon; Kawahara, Daisuke; Uchimoto, Kiyotaka; Kazama, Jun'ichi; Torisawa, Kentaro.

Proceedings - 2008 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2008. 2008. p. 322-328 4740467 (Proceedings - 2008 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2008).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Oh, JH, Kawahara, D, Uchimoto, K, Kazama, J & Torisawa, K 2008, Enriching multilingual language resources by discovering missing cross-language links in Wikipedia. in Proceedings - 2008 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2008., 4740467, Proceedings - 2008 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2008, pp. 322-328, 2008 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2008, Sydney, NSW, Australia, 12/9/08. https://doi.org/10.1109/WIIAT.2008.317
Oh JH, Kawahara D, Uchimoto K, Kazama J, Torisawa K. Enriching multilingual language resources by discovering missing cross-language links in Wikipedia. In Proceedings - 2008 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2008. 2008. p. 322-328. 4740467. (Proceedings - 2008 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2008). https://doi.org/10.1109/WIIAT.2008.317
Oh, Jong Hoon ; Kawahara, Daisuke ; Uchimoto, Kiyotaka ; Kazama, Jun'ichi ; Torisawa, Kentaro. / Enriching multilingual language resources by discovering missing cross-language links in Wikipedia. Proceedings - 2008 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2008. 2008. pp. 322-328 (Proceedings - 2008 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2008).
@inproceedings{17ddecd3a0914bb2b8fd7915e4701c89,
title = "Enriching multilingual language resources by discovering missing cross-language links in Wikipedia",
abstract = "We present a novel method for discovering missing crosslanguage links between English and Japanese Wikipedia articles. We collect candidates of missing cross-language links - a pair of English and Japanese Wikipedia articles, which could be connected by cross-language links. Then we select the correct cross-language links among the candidates by using a classifier trained with various types of features. Our method has three desirable characteristics for discovering missing links. First, our method can discover cross-language links with high accuracy (92{\%} precision with 78{\%} recall rates). Second, the features used in a classifier are language-independent. Third, without relying on any external knowledge, we generate the features based on resources automatically obtained from Wikipedia. In this work, we discover approximately 105 missing crosslanguage links from Wikipedia, which are almost two-thirds as many as the existing cross-language links in Wikipedia.",
author = "Oh, {Jong Hoon} and Daisuke Kawahara and Kiyotaka Uchimoto and Jun'ichi Kazama and Kentaro Torisawa",
year = "2008",
month = "12",
day = "1",
doi = "10.1109/WIIAT.2008.317",
language = "English",
isbn = "9780769534961",
series = "Proceedings - 2008 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2008",
pages = "322--328",
booktitle = "Proceedings - 2008 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2008",

}

TY - GEN

T1 - Enriching multilingual language resources by discovering missing cross-language links in Wikipedia

AU - Oh, Jong Hoon

AU - Kawahara, Daisuke

AU - Uchimoto, Kiyotaka

AU - Kazama, Jun'ichi

AU - Torisawa, Kentaro

PY - 2008/12/1

Y1 - 2008/12/1

N2 - We present a novel method for discovering missing crosslanguage links between English and Japanese Wikipedia articles. We collect candidates of missing cross-language links - a pair of English and Japanese Wikipedia articles, which could be connected by cross-language links. Then we select the correct cross-language links among the candidates by using a classifier trained with various types of features. Our method has three desirable characteristics for discovering missing links. First, our method can discover cross-language links with high accuracy (92% precision with 78% recall rates). Second, the features used in a classifier are language-independent. Third, without relying on any external knowledge, we generate the features based on resources automatically obtained from Wikipedia. In this work, we discover approximately 105 missing crosslanguage links from Wikipedia, which are almost two-thirds as many as the existing cross-language links in Wikipedia.

AB - We present a novel method for discovering missing crosslanguage links between English and Japanese Wikipedia articles. We collect candidates of missing cross-language links - a pair of English and Japanese Wikipedia articles, which could be connected by cross-language links. Then we select the correct cross-language links among the candidates by using a classifier trained with various types of features. Our method has three desirable characteristics for discovering missing links. First, our method can discover cross-language links with high accuracy (92% precision with 78% recall rates). Second, the features used in a classifier are language-independent. Third, without relying on any external knowledge, we generate the features based on resources automatically obtained from Wikipedia. In this work, we discover approximately 105 missing crosslanguage links from Wikipedia, which are almost two-thirds as many as the existing cross-language links in Wikipedia.

UR - http://www.scopus.com/inward/record.url?scp=62949243450&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=62949243450&partnerID=8YFLogxK

U2 - 10.1109/WIIAT.2008.317

DO - 10.1109/WIIAT.2008.317

M3 - Conference contribution

AN - SCOPUS:62949243450

SN - 9780769534961

T3 - Proceedings - 2008 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2008

SP - 322

EP - 328

BT - Proceedings - 2008 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2008

ER -