Estimating satisfactoriness of selectional restriction from corpus without a thesaurus

Yoichi Tomiura, Shosaku Tanaka, Toru Hitaka

Research output: Contribution to journalReview article

Abstract

A selectional restriction specifies what combinations of words are semantically valid in a particular syntactic construction. This is one of the basic and important pieces of knowledge in natural language processing and has been used for syntactic and word sense disambiguation. In the case of acquiring the selectional restriction for many combinations of words from a corpus, it is necessary to estimate whether or not a word combination that is not observed in the corpus satisfies the selectional restriction. This paper proposes a new method for estimating the degree of satisfaction of the selectional restriction for a word combination from a tagged corpus, based on the multiple regression model. The independent variables of this model correspond to modifiers. Unlike a conventional multiple regression analysis, the independent variables are also parameters to be learned. We experiment on estimating the degree of satisfaction of the selectional restriction for Japanese word combinations (noun, postpositional-particle, verb). The experimental results indicate that our method estimates the degree of satisfaction of a word combination not very well observed in the corpus, and that the accuracy of syntactic disambiguation using the co-occurrencies estimated by our method is higher than using co-occurrence probabilities smoothed by previous methods.

Original languageEnglish
Pages (from-to)400-416
Number of pages17
JournalACM Transactions on Asian Language Information Processing
Volume4
Issue number4
DOIs
Publication statusPublished - Dec 1 2005

Fingerprint

Thesauri
Syntactics
Regression analysis
Processing
Experiments

All Science Journal Classification (ASJC) codes

  • Computer Science(all)

Cite this

Estimating satisfactoriness of selectional restriction from corpus without a thesaurus. / Tomiura, Yoichi; Tanaka, Shosaku; Hitaka, Toru.

In: ACM Transactions on Asian Language Information Processing, Vol. 4, No. 4, 01.12.2005, p. 400-416.

Research output: Contribution to journalReview article

@article{992c8cf683d94773a5c5ab7d2896229b,
title = "Estimating satisfactoriness of selectional restriction from corpus without a thesaurus",
abstract = "A selectional restriction specifies what combinations of words are semantically valid in a particular syntactic construction. This is one of the basic and important pieces of knowledge in natural language processing and has been used for syntactic and word sense disambiguation. In the case of acquiring the selectional restriction for many combinations of words from a corpus, it is necessary to estimate whether or not a word combination that is not observed in the corpus satisfies the selectional restriction. This paper proposes a new method for estimating the degree of satisfaction of the selectional restriction for a word combination from a tagged corpus, based on the multiple regression model. The independent variables of this model correspond to modifiers. Unlike a conventional multiple regression analysis, the independent variables are also parameters to be learned. We experiment on estimating the degree of satisfaction of the selectional restriction for Japanese word combinations (noun, postpositional-particle, verb). The experimental results indicate that our method estimates the degree of satisfaction of a word combination not very well observed in the corpus, and that the accuracy of syntactic disambiguation using the co-occurrencies estimated by our method is higher than using co-occurrence probabilities smoothed by previous methods.",
author = "Yoichi Tomiura and Shosaku Tanaka and Toru Hitaka",
year = "2005",
month = "12",
day = "1",
doi = "10.1145/1113308.1113311",
language = "English",
volume = "4",
pages = "400--416",
journal = "ACM Transactions on Asian Language Information Processing",
issn = "1530-0226",
publisher = "Association for Computing Machinery (ACM)",
number = "4",

}

TY - JOUR

T1 - Estimating satisfactoriness of selectional restriction from corpus without a thesaurus

AU - Tomiura, Yoichi

AU - Tanaka, Shosaku

AU - Hitaka, Toru

PY - 2005/12/1

Y1 - 2005/12/1

N2 - A selectional restriction specifies what combinations of words are semantically valid in a particular syntactic construction. This is one of the basic and important pieces of knowledge in natural language processing and has been used for syntactic and word sense disambiguation. In the case of acquiring the selectional restriction for many combinations of words from a corpus, it is necessary to estimate whether or not a word combination that is not observed in the corpus satisfies the selectional restriction. This paper proposes a new method for estimating the degree of satisfaction of the selectional restriction for a word combination from a tagged corpus, based on the multiple regression model. The independent variables of this model correspond to modifiers. Unlike a conventional multiple regression analysis, the independent variables are also parameters to be learned. We experiment on estimating the degree of satisfaction of the selectional restriction for Japanese word combinations (noun, postpositional-particle, verb). The experimental results indicate that our method estimates the degree of satisfaction of a word combination not very well observed in the corpus, and that the accuracy of syntactic disambiguation using the co-occurrencies estimated by our method is higher than using co-occurrence probabilities smoothed by previous methods.

AB - A selectional restriction specifies what combinations of words are semantically valid in a particular syntactic construction. This is one of the basic and important pieces of knowledge in natural language processing and has been used for syntactic and word sense disambiguation. In the case of acquiring the selectional restriction for many combinations of words from a corpus, it is necessary to estimate whether or not a word combination that is not observed in the corpus satisfies the selectional restriction. This paper proposes a new method for estimating the degree of satisfaction of the selectional restriction for a word combination from a tagged corpus, based on the multiple regression model. The independent variables of this model correspond to modifiers. Unlike a conventional multiple regression analysis, the independent variables are also parameters to be learned. We experiment on estimating the degree of satisfaction of the selectional restriction for Japanese word combinations (noun, postpositional-particle, verb). The experimental results indicate that our method estimates the degree of satisfaction of a word combination not very well observed in the corpus, and that the accuracy of syntactic disambiguation using the co-occurrencies estimated by our method is higher than using co-occurrence probabilities smoothed by previous methods.

UR - http://www.scopus.com/inward/record.url?scp=33745176969&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33745176969&partnerID=8YFLogxK

U2 - 10.1145/1113308.1113311

DO - 10.1145/1113308.1113311

M3 - Review article

AN - SCOPUS:33745176969

VL - 4

SP - 400

EP - 416

JO - ACM Transactions on Asian Language Information Processing

JF - ACM Transactions on Asian Language Information Processing

SN - 1530-0226

IS - 4

ER -