Building a diverse document leads corpus annotated with semantic relations

Masatsugu Hangyo, Daisuke Kawahara, Sadao Kurohashi

Research output: Chapter in Book/Report/Conference proceedingConference contribution

11 Citations (Scopus)

Abstract

In these days, semantic analysis has been actively studied in natural language processing. For the study of semantic analysis, corpora with semantic annotations are essential. Although there are such corpora annotated on newspaper articles, there are various genres and styles, including linguistic expressions that are not found in newspaper articles. In this paper, we build a diverse document leads corpus annotated with semantic relations. To reduce the workload of annotators and annotate as many various documents as possible, we restrict the annotation target of each document to only the first three sentences. We have completed building a corpus of 1,000 documents and report the statistics of this corpus.

Original languageEnglish
Title of host publicationProceedings of the 26th Pacific Asia Conference on Language, Information and Computation, PACLIC 2012
Pages535-544
Number of pages10
Publication statusPublished - Dec 1 2012
Event26th Pacific Asia Conference on Language, Information and Computation, PACLIC 2012 - Bali, Indonesia
Duration: Nov 7 2012Nov 7 2012

Publication series

NameProceedings of the 26th Pacific Asia Conference on Language, Information and Computation, PACLIC 2012

Other

Other26th Pacific Asia Conference on Language, Information and Computation, PACLIC 2012
CountryIndonesia
CityBali
Period11/7/1211/7/12

Fingerprint

Semantics
Linguistics
Statistics
Processing

All Science Journal Classification (ASJC) codes

  • Information Systems
  • Software

Cite this

Hangyo, M., Kawahara, D., & Kurohashi, S. (2012). Building a diverse document leads corpus annotated with semantic relations. In Proceedings of the 26th Pacific Asia Conference on Language, Information and Computation, PACLIC 2012 (pp. 535-544). (Proceedings of the 26th Pacific Asia Conference on Language, Information and Computation, PACLIC 2012).

Building a diverse document leads corpus annotated with semantic relations. / Hangyo, Masatsugu; Kawahara, Daisuke; Kurohashi, Sadao.

Proceedings of the 26th Pacific Asia Conference on Language, Information and Computation, PACLIC 2012. 2012. p. 535-544 (Proceedings of the 26th Pacific Asia Conference on Language, Information and Computation, PACLIC 2012).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Hangyo, M, Kawahara, D & Kurohashi, S 2012, Building a diverse document leads corpus annotated with semantic relations. in Proceedings of the 26th Pacific Asia Conference on Language, Information and Computation, PACLIC 2012. Proceedings of the 26th Pacific Asia Conference on Language, Information and Computation, PACLIC 2012, pp. 535-544, 26th Pacific Asia Conference on Language, Information and Computation, PACLIC 2012, Bali, Indonesia, 11/7/12.
Hangyo M, Kawahara D, Kurohashi S. Building a diverse document leads corpus annotated with semantic relations. In Proceedings of the 26th Pacific Asia Conference on Language, Information and Computation, PACLIC 2012. 2012. p. 535-544. (Proceedings of the 26th Pacific Asia Conference on Language, Information and Computation, PACLIC 2012).
Hangyo, Masatsugu ; Kawahara, Daisuke ; Kurohashi, Sadao. / Building a diverse document leads corpus annotated with semantic relations. Proceedings of the 26th Pacific Asia Conference on Language, Information and Computation, PACLIC 2012. 2012. pp. 535-544 (Proceedings of the 26th Pacific Asia Conference on Language, Information and Computation, PACLIC 2012).
@inproceedings{baeafe4624e4411bb0a37ebe07780553,
title = "Building a diverse document leads corpus annotated with semantic relations",
abstract = "In these days, semantic analysis has been actively studied in natural language processing. For the study of semantic analysis, corpora with semantic annotations are essential. Although there are such corpora annotated on newspaper articles, there are various genres and styles, including linguistic expressions that are not found in newspaper articles. In this paper, we build a diverse document leads corpus annotated with semantic relations. To reduce the workload of annotators and annotate as many various documents as possible, we restrict the annotation target of each document to only the first three sentences. We have completed building a corpus of 1,000 documents and report the statistics of this corpus.",
author = "Masatsugu Hangyo and Daisuke Kawahara and Sadao Kurohashi",
year = "2012",
month = "12",
day = "1",
language = "English",
isbn = "9789791421171",
series = "Proceedings of the 26th Pacific Asia Conference on Language, Information and Computation, PACLIC 2012",
pages = "535--544",
booktitle = "Proceedings of the 26th Pacific Asia Conference on Language, Information and Computation, PACLIC 2012",

}

TY - GEN

T1 - Building a diverse document leads corpus annotated with semantic relations

AU - Hangyo, Masatsugu

AU - Kawahara, Daisuke

AU - Kurohashi, Sadao

PY - 2012/12/1

Y1 - 2012/12/1

N2 - In these days, semantic analysis has been actively studied in natural language processing. For the study of semantic analysis, corpora with semantic annotations are essential. Although there are such corpora annotated on newspaper articles, there are various genres and styles, including linguistic expressions that are not found in newspaper articles. In this paper, we build a diverse document leads corpus annotated with semantic relations. To reduce the workload of annotators and annotate as many various documents as possible, we restrict the annotation target of each document to only the first three sentences. We have completed building a corpus of 1,000 documents and report the statistics of this corpus.

AB - In these days, semantic analysis has been actively studied in natural language processing. For the study of semantic analysis, corpora with semantic annotations are essential. Although there are such corpora annotated on newspaper articles, there are various genres and styles, including linguistic expressions that are not found in newspaper articles. In this paper, we build a diverse document leads corpus annotated with semantic relations. To reduce the workload of annotators and annotate as many various documents as possible, we restrict the annotation target of each document to only the first three sentences. We have completed building a corpus of 1,000 documents and report the statistics of this corpus.

UR - http://www.scopus.com/inward/record.url?scp=84883341328&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84883341328&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:84883341328

SN - 9789791421171

T3 - Proceedings of the 26th Pacific Asia Conference on Language, Information and Computation, PACLIC 2012

SP - 535

EP - 544

BT - Proceedings of the 26th Pacific Asia Conference on Language, Information and Computation, PACLIC 2012

ER -