CEFR-based lexical simplification dataset

Satoru Uchida, Shohei Takada, Yuki Arase

研究成果: Chapter in Book/Report/Conference proceedingConference contribution

2 被引用数 (Scopus)

抄録

This study creates a language dataset for lexical simplification based on Common European Framework of References for Languages (CEFR) levels (CEFR-LS). Lexical simplification has continued to be one of the important tasks for language learning and education. There are several language resources for lexical simplification that are available for generating rules and creating simplifiers using machine learning. However, these resources are not tailored to language education with word levels and lists of candidates tending to be subjective. Different from these, the present study constructs a CEFR-LS whose target and candidate words are assigned CEFR levels using CEFR-J wordlists and English Vocabulary Profile, and candidates are selected using an online thesaurus. Since CEFR is widely used around the world, using CEFR levels makes it possible to apply a simplification method based on our dataset to language education directly. CEFR-LS currently includes 406 targets and 4912 candidates. To evaluate the validity of CEFR-LS for machine learning, two basic models are employed for selecting candidates and the results are presented as a reference for future users of the dataset.

本文言語英語
ホスト出版物のタイトルLREC 2018 - 11th International Conference on Language Resources and Evaluation
編集者Hitoshi Isahara, Bente Maegaard, Stelios Piperidis, Christopher Cieri, Thierry Declerck, Koiti Hasida, Helene Mazo, Khalid Choukri, Sara Goggi, Joseph Mariani, Asuncion Moreno, Nicoletta Calzolari, Jan Odijk, Takenobu Tokunaga
出版社European Language Resources Association (ELRA)
ページ3254-3258
ページ数5
ISBN(電子版)9791095546009
出版ステータス出版済み - 2019
イベント11th International Conference on Language Resources and Evaluation, LREC 2018 - Miyazaki, 日本
継続期間: 5 7 20185 12 2018

出版物シリーズ

名前LREC 2018 - 11th International Conference on Language Resources and Evaluation

会議

会議11th International Conference on Language Resources and Evaluation, LREC 2018
Country日本
CityMiyazaki
Period5/7/185/12/18

All Science Journal Classification (ASJC) codes

  • Linguistics and Language
  • Education
  • Library and Information Sciences
  • Language and Linguistics

フィンガープリント 「CEFR-based lexical simplification dataset」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル