Exploiting shared Chinese characters in Chinese word segmentation optimization for Chinese-Japanese Machine Translation

Chenhui Chu, Toshiaki Nakazawa, Daisuke Kawahara, Sadao Kurohashi

研究成果: Contribution to conferencePaper査読

11 被引用数 (Scopus)

抄録

Unknown words and word segmentation granularity are two main problems in Chinese word segmentation for Chinese-Japanese Machine Translation (MT). In this paper, we propose an approach of exploiting common Chinese characters shared between Chinese and Japanese in Chinese word segmentation optimization for MT aiming to solve these problems. We augment the system dictionary of a Chinese segmenter by extracting Chinese lexicons from a parallel training corpus. In addition, we adjust the granularity of the training data for the Chinese segmenter to that of Japanese. Experimental results of Chinese-Japanese MT on a phrase-based SMT system show that our approach improves MT performance significantly.

本文言語英語
ページ35-42
ページ数8
出版ステータス出版済み - 1 1 2012
イベント16th Annual Conference of the European Association for Machine Translation, EAMT 2012 - Trento, イタリア
継続期間: 5 28 20125 30 2012

その他

その他16th Annual Conference of the European Association for Machine Translation, EAMT 2012
国/地域イタリア
CityTrento
Period5/28/125/30/12

All Science Journal Classification (ASJC) codes

  • 言語および言語学
  • 人間とコンピュータの相互作用
  • ソフトウェア

フィンガープリント

「Exploiting shared Chinese characters in Chinese word segmentation optimization for Chinese-Japanese Machine Translation」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル