TY - JOUR
T1 - Integrating multiple dependency corpora for inducing wide-coverage Japanese CCG resources
AU - Uematsu, Sumire
AU - Matsuzaki, Takuya
AU - Hanaoka, Hiroki
AU - Miyao, Yusuke
AU - Mima, Hideki
N1 - Publisher Copyright:
© 2015 ACM
PY - 2015/1
Y1 - 2015/1
N2 - A novel method to induce wide-coverage Combinatory Categorial Grammar (CCG) resources for Japanese is proposed in this article. For some languages including English, the availability of large annotated corpora and the development of data-based induction of lexicalized grammar have enabled deep parsing, i.e., parsing based on lexicalized grammars. However, deep parsing for Japanese has not been widely studied. This is mainly because most Japanese syntactic resources are represented in chunk-based dependency structures, while previous methods for inducing grammars are dependent on tree corpora. To translate syntactic information presented in chunk-based dependencies to phrase structures as accurately as possible, integration of annotation from multiple dependency-based corpora is proposed. Our method first integrates dependency structures and predicate-argument information and converts them into phrase structure trees. The trees are then transformed into CCG derivations in a similar way to previously proposed methods. The quality of the conversion is empirically evaluated in terms of the coverage of the obtained CCG lexicon and the accuracy of the parsing with the grammar. While the transforming process used in this study is specialized for Japanese, the framework of our method would be applicable to other languages for which dependency-based analysis has been regarded as more appropriate than phrase structure-based analysis due to morphosyntactic features.
AB - A novel method to induce wide-coverage Combinatory Categorial Grammar (CCG) resources for Japanese is proposed in this article. For some languages including English, the availability of large annotated corpora and the development of data-based induction of lexicalized grammar have enabled deep parsing, i.e., parsing based on lexicalized grammars. However, deep parsing for Japanese has not been widely studied. This is mainly because most Japanese syntactic resources are represented in chunk-based dependency structures, while previous methods for inducing grammars are dependent on tree corpora. To translate syntactic information presented in chunk-based dependencies to phrase structures as accurately as possible, integration of annotation from multiple dependency-based corpora is proposed. Our method first integrates dependency structures and predicate-argument information and converts them into phrase structure trees. The trees are then transformed into CCG derivations in a similar way to previously proposed methods. The quality of the conversion is empirically evaluated in terms of the coverage of the obtained CCG lexicon and the accuracy of the parsing with the grammar. While the transforming process used in this study is specialized for Japanese, the framework of our method would be applicable to other languages for which dependency-based analysis has been regarded as more appropriate than phrase structure-based analysis due to morphosyntactic features.
UR - http://www.scopus.com/inward/record.url?scp=85044569355&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85044569355&partnerID=8YFLogxK
U2 - 10.1145/2658997
DO - 10.1145/2658997
M3 - Article
AN - SCOPUS:85044569355
VL - 14
JO - ACM Transactions on Asian and Low-Resource Language Information Processing
JF - ACM Transactions on Asian and Low-Resource Language Information Processing
SN - 2375-4699
IS - 1
M1 - 1
ER -