A method of extracting related words using standardized mutual information

Tomohiko Sugimachi, Akira Ishino, Masayuki Takeda, Fumihiro Matsuo

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

Techniques of automatic extraction of related words are of great importance in many applications such as query expansion and automatic thesaurus construction. In this paper, a method of extracting related words is proposed basing on the statistical information about the co-occurrences of words from huge corpora. The mutual information is one of such statistical measures and has been used for application mainly in natural language processing. A drawback is, however, the mutual information depends mainly on frequencies of words. To overcome this difficulty, we propose as a new measure a normalize deviation of mutual information. We also reveal a correspondence between word ambiguity and related words using word relation graphs constructed using this measure.

Original languageEnglish
Pages (from-to)478-485
Number of pages8
JournalLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume2843
Publication statusPublished - Dec 1 2003

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint Dive into the research topics of 'A method of extracting related words using standardized mutual information'. Together they form a unique fingerprint.

  • Cite this