Automatic acquisition of basic Katakana lexicon from a given corpus

Toshiaki Nakazawa, Daisuke Kawahara, Sadao Kurohashi

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

Katakana, Japanese phonogram mainly used for loan words, is a trou-blemaker in Japanese word segmentation. Since Katakana words are heavily domain-dependent and there are many Katakana neologisms, it is almost impossible to construct and maintain Katakana word dictionary by hand. This paper proposes an automatic segmentation method of Japanese Katakana compounds, which makes it possible to construct precise and concise Katakana word dictionary automati-cally, given only a medium or large size of Japanese corpus of some domain.

Original languageEnglish
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Pages682-693
Number of pages12
DOIs
Publication statusPublished - Dec 1 2005
Event2nd International Joint Conference on Natural Language Processing, IJCNLP 2005 - Jeju Island, Korea, Republic of
Duration: Oct 11 2005Oct 13 2005

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume3651 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other2nd International Joint Conference on Natural Language Processing, IJCNLP 2005
CountryKorea, Republic of
CityJeju Island
Period10/11/0510/13/05

Fingerprint

Glossaries
Segmentation
Dependent
Corpus
Dictionary
Acquisition

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Computer Science(all)

Cite this

Nakazawa, T., Kawahara, D., & Kurohashi, S. (2005). Automatic acquisition of basic Katakana lexicon from a given corpus. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (pp. 682-693). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 3651 LNAI). https://doi.org/10.1007/11562214_60

Automatic acquisition of basic Katakana lexicon from a given corpus. / Nakazawa, Toshiaki; Kawahara, Daisuke; Kurohashi, Sadao.

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). 2005. p. 682-693 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 3651 LNAI).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Nakazawa, T, Kawahara, D & Kurohashi, S 2005, Automatic acquisition of basic Katakana lexicon from a given corpus. in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 3651 LNAI, pp. 682-693, 2nd International Joint Conference on Natural Language Processing, IJCNLP 2005, Jeju Island, Korea, Republic of, 10/11/05. https://doi.org/10.1007/11562214_60
Nakazawa T, Kawahara D, Kurohashi S. Automatic acquisition of basic Katakana lexicon from a given corpus. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). 2005. p. 682-693. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/11562214_60
Nakazawa, Toshiaki ; Kawahara, Daisuke ; Kurohashi, Sadao. / Automatic acquisition of basic Katakana lexicon from a given corpus. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). 2005. pp. 682-693 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{b4f3ef66372146a1a883560160c37807,
title = "Automatic acquisition of basic Katakana lexicon from a given corpus",
abstract = "Katakana, Japanese phonogram mainly used for loan words, is a trou-blemaker in Japanese word segmentation. Since Katakana words are heavily domain-dependent and there are many Katakana neologisms, it is almost impossible to construct and maintain Katakana word dictionary by hand. This paper proposes an automatic segmentation method of Japanese Katakana compounds, which makes it possible to construct precise and concise Katakana word dictionary automati-cally, given only a medium or large size of Japanese corpus of some domain.",
author = "Toshiaki Nakazawa and Daisuke Kawahara and Sadao Kurohashi",
year = "2005",
month = "12",
day = "1",
doi = "10.1007/11562214_60",
language = "English",
isbn = "3540291725",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
pages = "682--693",
booktitle = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

}

TY - GEN

T1 - Automatic acquisition of basic Katakana lexicon from a given corpus

AU - Nakazawa, Toshiaki

AU - Kawahara, Daisuke

AU - Kurohashi, Sadao

PY - 2005/12/1

Y1 - 2005/12/1

N2 - Katakana, Japanese phonogram mainly used for loan words, is a trou-blemaker in Japanese word segmentation. Since Katakana words are heavily domain-dependent and there are many Katakana neologisms, it is almost impossible to construct and maintain Katakana word dictionary by hand. This paper proposes an automatic segmentation method of Japanese Katakana compounds, which makes it possible to construct precise and concise Katakana word dictionary automati-cally, given only a medium or large size of Japanese corpus of some domain.

AB - Katakana, Japanese phonogram mainly used for loan words, is a trou-blemaker in Japanese word segmentation. Since Katakana words are heavily domain-dependent and there are many Katakana neologisms, it is almost impossible to construct and maintain Katakana word dictionary by hand. This paper proposes an automatic segmentation method of Japanese Katakana compounds, which makes it possible to construct precise and concise Katakana word dictionary automati-cally, given only a medium or large size of Japanese corpus of some domain.

UR - http://www.scopus.com/inward/record.url?scp=33645990280&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33645990280&partnerID=8YFLogxK

U2 - 10.1007/11562214_60

DO - 10.1007/11562214_60

M3 - Conference contribution

AN - SCOPUS:33645990280

SN - 3540291725

SN - 9783540291725

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 682

EP - 693

BT - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

ER -