Techniques of automatic extraction of related words are of great importance in many applications such as query expansion and automatic thesaurus construction. In this paper, a method of extracting related words is proposed basing on the statistical information about the co-occurrences of words from huge corpora. The mutual information is one of such statistical measures and has been used for application mainly in natural language processing. A drawback is, however, the mutual information depends mainly on frequencies of words. To overcome this difficulty, we propose as a new measure a normalize deviation of mutual information. We also reveal a correspondence between word ambiguity and related words using word relation graphs constructed using this measure.
|Number of pages||8|
|Journal||Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)|
|Publication status||Published - Dec 1 2003|
All Science Journal Classification (ASJC) codes
- Theoretical Computer Science
- Computer Science(all)