Chinese-Japanese machine translation exploiting Chinese characters

Chenhui Chu, Toshiaki Nakazawa, Daisuke Kawahara, Sadao Kurohashi

研究成果: ジャーナルへの寄稿記事

10 引用 (Scopus)

抄録

The Chinese and Japanese languages share Chinese characters. Since the Chinese characters in Japanese originated from ancient China, many common Chinese characters exist between these two languages. Since Chinese characters contain significant semantic information and common Chinese characters share the same meaning in the two languages, they can be quite useful in Chinese-Japanese machine translation (MT). We therefore propose a method for creating a Chinese character mapping table for Japanese, traditional Chinese, and simplified Chinese, with the aim of constructing a complete resource of common Chinese characters. Furthermore, we point out two main problems in Chinese word segmentation for Chinese-Japanese MT, namely, unknown words and word segmentation granularity, and propose an approach exploiting common Chinese characters to solve these problems. We also propose a statistical method for detecting other semantically equivalent Chinese characters other than the common ones and a method for exploiting shared Chinese characters in phrase alignment. Results of the experiments carried out on a state-of-the-art phrase-based statistical MT system and an example-based MT system show that our proposed approaches can improve MT performance significantly, thereby verifying the effectiveness of shared Chinese characters for Chinese-Japanese MT.

元の言語英語
記事番号16
ジャーナルACM Transactions on Asian Language Information Processing
12
発行部数4
DOI
出版物ステータス出版済み - 10 1 2013

Fingerprint

Statistical methods
Semantics
Experiments

All Science Journal Classification (ASJC) codes

  • Computer Science(all)

これを引用

Chinese-Japanese machine translation exploiting Chinese characters. / Chu, Chenhui; Nakazawa, Toshiaki; Kawahara, Daisuke; Kurohashi, Sadao.

:: ACM Transactions on Asian Language Information Processing, 巻 12, 番号 4, 16, 01.10.2013.

研究成果: ジャーナルへの寄稿記事

Chu, Chenhui ; Nakazawa, Toshiaki ; Kawahara, Daisuke ; Kurohashi, Sadao. / Chinese-Japanese machine translation exploiting Chinese characters. :: ACM Transactions on Asian Language Information Processing. 2013 ; 巻 12, 番号 4.
@article{5bf676c8a77e4a848dcb278b8e9f4910,
title = "Chinese-Japanese machine translation exploiting Chinese characters",
abstract = "The Chinese and Japanese languages share Chinese characters. Since the Chinese characters in Japanese originated from ancient China, many common Chinese characters exist between these two languages. Since Chinese characters contain significant semantic information and common Chinese characters share the same meaning in the two languages, they can be quite useful in Chinese-Japanese machine translation (MT). We therefore propose a method for creating a Chinese character mapping table for Japanese, traditional Chinese, and simplified Chinese, with the aim of constructing a complete resource of common Chinese characters. Furthermore, we point out two main problems in Chinese word segmentation for Chinese-Japanese MT, namely, unknown words and word segmentation granularity, and propose an approach exploiting common Chinese characters to solve these problems. We also propose a statistical method for detecting other semantically equivalent Chinese characters other than the common ones and a method for exploiting shared Chinese characters in phrase alignment. Results of the experiments carried out on a state-of-the-art phrase-based statistical MT system and an example-based MT system show that our proposed approaches can improve MT performance significantly, thereby verifying the effectiveness of shared Chinese characters for Chinese-Japanese MT.",
author = "Chenhui Chu and Toshiaki Nakazawa and Daisuke Kawahara and Sadao Kurohashi",
year = "2013",
month = "10",
day = "1",
doi = "10.1145/2523057.2523059",
language = "English",
volume = "12",
journal = "ACM Transactions on Asian Language Information Processing",
issn = "1530-0226",
publisher = "Association for Computing Machinery (ACM)",
number = "4",

}

TY - JOUR

T1 - Chinese-Japanese machine translation exploiting Chinese characters

AU - Chu, Chenhui

AU - Nakazawa, Toshiaki

AU - Kawahara, Daisuke

AU - Kurohashi, Sadao

PY - 2013/10/1

Y1 - 2013/10/1

N2 - The Chinese and Japanese languages share Chinese characters. Since the Chinese characters in Japanese originated from ancient China, many common Chinese characters exist between these two languages. Since Chinese characters contain significant semantic information and common Chinese characters share the same meaning in the two languages, they can be quite useful in Chinese-Japanese machine translation (MT). We therefore propose a method for creating a Chinese character mapping table for Japanese, traditional Chinese, and simplified Chinese, with the aim of constructing a complete resource of common Chinese characters. Furthermore, we point out two main problems in Chinese word segmentation for Chinese-Japanese MT, namely, unknown words and word segmentation granularity, and propose an approach exploiting common Chinese characters to solve these problems. We also propose a statistical method for detecting other semantically equivalent Chinese characters other than the common ones and a method for exploiting shared Chinese characters in phrase alignment. Results of the experiments carried out on a state-of-the-art phrase-based statistical MT system and an example-based MT system show that our proposed approaches can improve MT performance significantly, thereby verifying the effectiveness of shared Chinese characters for Chinese-Japanese MT.

AB - The Chinese and Japanese languages share Chinese characters. Since the Chinese characters in Japanese originated from ancient China, many common Chinese characters exist between these two languages. Since Chinese characters contain significant semantic information and common Chinese characters share the same meaning in the two languages, they can be quite useful in Chinese-Japanese machine translation (MT). We therefore propose a method for creating a Chinese character mapping table for Japanese, traditional Chinese, and simplified Chinese, with the aim of constructing a complete resource of common Chinese characters. Furthermore, we point out two main problems in Chinese word segmentation for Chinese-Japanese MT, namely, unknown words and word segmentation granularity, and propose an approach exploiting common Chinese characters to solve these problems. We also propose a statistical method for detecting other semantically equivalent Chinese characters other than the common ones and a method for exploiting shared Chinese characters in phrase alignment. Results of the experiments carried out on a state-of-the-art phrase-based statistical MT system and an example-based MT system show that our proposed approaches can improve MT performance significantly, thereby verifying the effectiveness of shared Chinese characters for Chinese-Japanese MT.

UR - http://www.scopus.com/inward/record.url?scp=84887056965&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84887056965&partnerID=8YFLogxK

U2 - 10.1145/2523057.2523059

DO - 10.1145/2523057.2523059

M3 - Article

AN - SCOPUS:84887056965

VL - 12

JO - ACM Transactions on Asian Language Information Processing

JF - ACM Transactions on Asian Language Information Processing

SN - 1530-0226

IS - 4

M1 - 16

ER -