Exploiting shared Chinese characters in Chinese word segmentation optimization for Chinese-Japanese Machine Translation

Chenhui Chu, Toshiaki Nakazawa, Daisuke Kawahara, Sadao Kurohashi

Research output: Contribution to conferencePaper

11 Citations (Scopus)

Abstract

Unknown words and word segmentation granularity are two main problems in Chinese word segmentation for Chinese-Japanese Machine Translation (MT). In this paper, we propose an approach of exploiting common Chinese characters shared between Chinese and Japanese in Chinese word segmentation optimization for MT aiming to solve these problems. We augment the system dictionary of a Chinese segmenter by extracting Chinese lexicons from a parallel training corpus. In addition, we adjust the granularity of the training data for the Chinese segmenter to that of Japanese. Experimental results of Chinese-Japanese MT on a phrase-based SMT system show that our approach improves MT performance significantly.

Original languageEnglish
Pages35-42
Number of pages8
Publication statusPublished - Jan 1 2012
Event16th Annual Conference of the European Association for Machine Translation, EAMT 2012 - Trento, Italy
Duration: May 28 2012May 30 2012

Other

Other16th Annual Conference of the European Association for Machine Translation, EAMT 2012
CountryItaly
CityTrento
Period5/28/125/30/12

Fingerprint

Surface mount technology
Glossaries
Word Segmentation
Chinese Characters
Machine Translation
Granularity

All Science Journal Classification (ASJC) codes

  • Language and Linguistics
  • Human-Computer Interaction
  • Software

Cite this

Chu, C., Nakazawa, T., Kawahara, D., & Kurohashi, S. (2012). Exploiting shared Chinese characters in Chinese word segmentation optimization for Chinese-Japanese Machine Translation. 35-42. Paper presented at 16th Annual Conference of the European Association for Machine Translation, EAMT 2012, Trento, Italy.

Exploiting shared Chinese characters in Chinese word segmentation optimization for Chinese-Japanese Machine Translation. / Chu, Chenhui; Nakazawa, Toshiaki; Kawahara, Daisuke; Kurohashi, Sadao.

2012. 35-42 Paper presented at 16th Annual Conference of the European Association for Machine Translation, EAMT 2012, Trento, Italy.

Research output: Contribution to conferencePaper

Chu, C, Nakazawa, T, Kawahara, D & Kurohashi, S 2012, 'Exploiting shared Chinese characters in Chinese word segmentation optimization for Chinese-Japanese Machine Translation', Paper presented at 16th Annual Conference of the European Association for Machine Translation, EAMT 2012, Trento, Italy, 5/28/12 - 5/30/12 pp. 35-42.
Chu C, Nakazawa T, Kawahara D, Kurohashi S. Exploiting shared Chinese characters in Chinese word segmentation optimization for Chinese-Japanese Machine Translation. 2012. Paper presented at 16th Annual Conference of the European Association for Machine Translation, EAMT 2012, Trento, Italy.
Chu, Chenhui ; Nakazawa, Toshiaki ; Kawahara, Daisuke ; Kurohashi, Sadao. / Exploiting shared Chinese characters in Chinese word segmentation optimization for Chinese-Japanese Machine Translation. Paper presented at 16th Annual Conference of the European Association for Machine Translation, EAMT 2012, Trento, Italy.8 p.
@conference{a6448262495e4e15be1e5728963cabaa,
title = "Exploiting shared Chinese characters in Chinese word segmentation optimization for Chinese-Japanese Machine Translation",
abstract = "Unknown words and word segmentation granularity are two main problems in Chinese word segmentation for Chinese-Japanese Machine Translation (MT). In this paper, we propose an approach of exploiting common Chinese characters shared between Chinese and Japanese in Chinese word segmentation optimization for MT aiming to solve these problems. We augment the system dictionary of a Chinese segmenter by extracting Chinese lexicons from a parallel training corpus. In addition, we adjust the granularity of the training data for the Chinese segmenter to that of Japanese. Experimental results of Chinese-Japanese MT on a phrase-based SMT system show that our approach improves MT performance significantly.",
author = "Chenhui Chu and Toshiaki Nakazawa and Daisuke Kawahara and Sadao Kurohashi",
year = "2012",
month = "1",
day = "1",
language = "English",
pages = "35--42",
note = "16th Annual Conference of the European Association for Machine Translation, EAMT 2012 ; Conference date: 28-05-2012 Through 30-05-2012",

}

TY - CONF

T1 - Exploiting shared Chinese characters in Chinese word segmentation optimization for Chinese-Japanese Machine Translation

AU - Chu, Chenhui

AU - Nakazawa, Toshiaki

AU - Kawahara, Daisuke

AU - Kurohashi, Sadao

PY - 2012/1/1

Y1 - 2012/1/1

N2 - Unknown words and word segmentation granularity are two main problems in Chinese word segmentation for Chinese-Japanese Machine Translation (MT). In this paper, we propose an approach of exploiting common Chinese characters shared between Chinese and Japanese in Chinese word segmentation optimization for MT aiming to solve these problems. We augment the system dictionary of a Chinese segmenter by extracting Chinese lexicons from a parallel training corpus. In addition, we adjust the granularity of the training data for the Chinese segmenter to that of Japanese. Experimental results of Chinese-Japanese MT on a phrase-based SMT system show that our approach improves MT performance significantly.

AB - Unknown words and word segmentation granularity are two main problems in Chinese word segmentation for Chinese-Japanese Machine Translation (MT). In this paper, we propose an approach of exploiting common Chinese characters shared between Chinese and Japanese in Chinese word segmentation optimization for MT aiming to solve these problems. We augment the system dictionary of a Chinese segmenter by extracting Chinese lexicons from a parallel training corpus. In addition, we adjust the granularity of the training data for the Chinese segmenter to that of Japanese. Experimental results of Chinese-Japanese MT on a phrase-based SMT system show that our approach improves MT performance significantly.

UR - http://www.scopus.com/inward/record.url?scp=85001084858&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85001084858&partnerID=8YFLogxK

M3 - Paper

AN - SCOPUS:85001084858

SP - 35

EP - 42

ER -