Weighting of noun phrases based on local frequency of nouns

Yasuhiro Yamada, Yuusuke Himeno, Tetsuya Nakatoh

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

The tf-idf is a well-known weighting measure for words in texts. It measures both the frequency and the locality of words. It is often used for information retrieval and text mining. However, a lot of infrequent words have the same tf-idf value. In this study, the words are noun phrases. This paper proposes a novel weighting measure for noun phrases in texts by using the local frequency of nouns that construct a noun phrase. The proposed measure is calculated by combining the tf-idf of a noun phrase and the average of the difference between its frequency and the frequency of nouns within the phrase. The proposed measure was evaluated in experiments on the datasets of 19,997 newsgroup texts written in English and 206 Wikipedia pages written in Japanese. The experiments showed that the number of noun phrases with the same proposed measure is less than the number of noun phrases with the same tf-idf.

Original languageEnglish
Title of host publicationRecent Advances on Soft Computing and Data Mining - Proceedings of the 3rd International Conference on Soft Computing and Data Mining SCDM 2018
EditorsJemal H. Abawajy, Rozaida Ghazali, Mustafa Mat Deris, Nazri Mohd Nawi
PublisherSpringer Verlag
Pages436-445
Number of pages10
ISBN (Print)9783319725499
DOIs
Publication statusPublished - Jan 1 2018
Event3rd International Conference on Soft Computing and Data Mining, SCDM 2018 - Johor, Malaysia
Duration: Feb 6 2018Feb 8 2018

Publication series

NameAdvances in Intelligent Systems and Computing
Volume700
ISSN (Print)2194-5357

Other

Other3rd International Conference on Soft Computing and Data Mining, SCDM 2018
CountryMalaysia
CityJohor
Period2/6/182/8/18

All Science Journal Classification (ASJC) codes

  • Control and Systems Engineering
  • Computer Science(all)

Fingerprint Dive into the research topics of 'Weighting of noun phrases based on local frequency of nouns'. Together they form a unique fingerprint.

  • Cite this

    Yamada, Y., Himeno, Y., & Nakatoh, T. (2018). Weighting of noun phrases based on local frequency of nouns. In J. H. Abawajy, R. Ghazali, M. M. Deris, & N. M. Nawi (Eds.), Recent Advances on Soft Computing and Data Mining - Proceedings of the 3rd International Conference on Soft Computing and Data Mining SCDM 2018 (pp. 436-445). (Advances in Intelligent Systems and Computing; Vol. 700). Springer Verlag. https://doi.org/10.1007/978-3-319-72550-5_42