Automatic acquisition of basic Katakana lexicon from a given corpus

Toshiaki Nakazawa, Daisuke Kawahara, Sadao Kurohashi

研究成果: Chapter in Book/Report/Conference proceedingConference contribution

2 被引用数 (Scopus)

抄録

Katakana, Japanese phonogram mainly used for loan words, is a trou-blemaker in Japanese word segmentation. Since Katakana words are heavily domain-dependent and there are many Katakana neologisms, it is almost impossible to construct and maintain Katakana word dictionary by hand. This paper proposes an automatic segmentation method of Japanese Katakana compounds, which makes it possible to construct precise and concise Katakana word dictionary automati-cally, given only a medium or large size of Japanese corpus of some domain.

本文言語英語
ホスト出版物のタイトルLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
ページ682-693
ページ数12
DOI
出版ステータス出版済み - 2005
外部発表はい
イベント2nd International Joint Conference on Natural Language Processing, IJCNLP 2005 - Jeju Island, 大韓民国
継続期間: 10 11 200510 13 2005

出版物シリーズ

名前Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
3651 LNAI
ISSN(印刷版)0302-9743
ISSN(電子版)1611-3349

その他

その他2nd International Joint Conference on Natural Language Processing, IJCNLP 2005
国/地域大韓民国
CityJeju Island
Period10/11/0510/13/05

All Science Journal Classification (ASJC) codes

  • 理論的コンピュータサイエンス
  • コンピュータ サイエンス(全般)

フィンガープリント

「Automatic acquisition of basic Katakana lexicon from a given corpus」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル