Discovering unpredictably related words from logs of scholarly repositories for grouping similar queries

Takehiro Shiraishi, Toshihiro Aoyama, Kazutsuna Yamaji, Takao Namiki, Daisuke Ikeda

Research output: Contribution to journalArticle

Abstract

As the number of institutional repositories is increasing, more and more people, including non-researchers, are accessing academic contents on them via search engines. User models of non-researchers are not well understood yet, unlike researchers, although non-researchers may use quite different queries from researchers. For understanding their search behavior, it is a good way to categorize search queries of non-researchers into groups. This chapter is devoted to finding related query words at the first step from logs of scholarly repositories. In particular, we try to find words which are related from the viewpoint of non-researchers. In this sense, these words are unpredictably related. A simple method to do this using the access log is that we treat queries which lead to the same paper as related. However, it is challenging because one academic paper generally has a small amount of accesses while accesses to one paper bring many kinds of query words. Instead, we expand relationships between query words and papers, and use a graph-based algorithm in which query words and papers are vertices to find related words. As experiments, we usemore than 400,000 accesses recorded at amajor portal site of Japanese theses, and show that we can find related words with respect to specific disciplines if these words appear frequently. There words seems to be interested in non-researchers and hencewe can’t say they are not related in a usual manner. This result implicates that we can obtain related words if we enrich relationships between technical terminologies using background knowledge, such as dictionaries.

Original languageEnglish
Pages (from-to)47-60
Number of pages14
JournalStudies in Computational Intelligence
Volume553
DOIs
Publication statusPublished - Jan 1 2014

Fingerprint

Terminology
Glossaries
Search engines
Experiments

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence

Cite this

Discovering unpredictably related words from logs of scholarly repositories for grouping similar queries. / Shiraishi, Takehiro; Aoyama, Toshihiro; Yamaji, Kazutsuna; Namiki, Takao; Ikeda, Daisuke.

In: Studies in Computational Intelligence, Vol. 553, 01.01.2014, p. 47-60.

Research output: Contribution to journalArticle

Shiraishi, Takehiro ; Aoyama, Toshihiro ; Yamaji, Kazutsuna ; Namiki, Takao ; Ikeda, Daisuke. / Discovering unpredictably related words from logs of scholarly repositories for grouping similar queries. In: Studies in Computational Intelligence. 2014 ; Vol. 553. pp. 47-60.
@article{5a3481a96c0c4be8837493ea28e991f9,
title = "Discovering unpredictably related words from logs of scholarly repositories for grouping similar queries",
abstract = "As the number of institutional repositories is increasing, more and more people, including non-researchers, are accessing academic contents on them via search engines. User models of non-researchers are not well understood yet, unlike researchers, although non-researchers may use quite different queries from researchers. For understanding their search behavior, it is a good way to categorize search queries of non-researchers into groups. This chapter is devoted to finding related query words at the first step from logs of scholarly repositories. In particular, we try to find words which are related from the viewpoint of non-researchers. In this sense, these words are unpredictably related. A simple method to do this using the access log is that we treat queries which lead to the same paper as related. However, it is challenging because one academic paper generally has a small amount of accesses while accesses to one paper bring many kinds of query words. Instead, we expand relationships between query words and papers, and use a graph-based algorithm in which query words and papers are vertices to find related words. As experiments, we usemore than 400,000 accesses recorded at amajor portal site of Japanese theses, and show that we can find related words with respect to specific disciplines if these words appear frequently. There words seems to be interested in non-researchers and hencewe can’t say they are not related in a usual manner. This result implicates that we can obtain related words if we enrich relationships between technical terminologies using background knowledge, such as dictionaries.",
author = "Takehiro Shiraishi and Toshihiro Aoyama and Kazutsuna Yamaji and Takao Namiki and Daisuke Ikeda",
year = "2014",
month = "1",
day = "1",
doi = "10.1007/978-3-319-05717-0_4",
language = "English",
volume = "553",
pages = "47--60",
journal = "Studies in Computational Intelligence",
issn = "1860-949X",
publisher = "Springer Verlag",

}

TY - JOUR

T1 - Discovering unpredictably related words from logs of scholarly repositories for grouping similar queries

AU - Shiraishi, Takehiro

AU - Aoyama, Toshihiro

AU - Yamaji, Kazutsuna

AU - Namiki, Takao

AU - Ikeda, Daisuke

PY - 2014/1/1

Y1 - 2014/1/1

N2 - As the number of institutional repositories is increasing, more and more people, including non-researchers, are accessing academic contents on them via search engines. User models of non-researchers are not well understood yet, unlike researchers, although non-researchers may use quite different queries from researchers. For understanding their search behavior, it is a good way to categorize search queries of non-researchers into groups. This chapter is devoted to finding related query words at the first step from logs of scholarly repositories. In particular, we try to find words which are related from the viewpoint of non-researchers. In this sense, these words are unpredictably related. A simple method to do this using the access log is that we treat queries which lead to the same paper as related. However, it is challenging because one academic paper generally has a small amount of accesses while accesses to one paper bring many kinds of query words. Instead, we expand relationships between query words and papers, and use a graph-based algorithm in which query words and papers are vertices to find related words. As experiments, we usemore than 400,000 accesses recorded at amajor portal site of Japanese theses, and show that we can find related words with respect to specific disciplines if these words appear frequently. There words seems to be interested in non-researchers and hencewe can’t say they are not related in a usual manner. This result implicates that we can obtain related words if we enrich relationships between technical terminologies using background knowledge, such as dictionaries.

AB - As the number of institutional repositories is increasing, more and more people, including non-researchers, are accessing academic contents on them via search engines. User models of non-researchers are not well understood yet, unlike researchers, although non-researchers may use quite different queries from researchers. For understanding their search behavior, it is a good way to categorize search queries of non-researchers into groups. This chapter is devoted to finding related query words at the first step from logs of scholarly repositories. In particular, we try to find words which are related from the viewpoint of non-researchers. In this sense, these words are unpredictably related. A simple method to do this using the access log is that we treat queries which lead to the same paper as related. However, it is challenging because one academic paper generally has a small amount of accesses while accesses to one paper bring many kinds of query words. Instead, we expand relationships between query words and papers, and use a graph-based algorithm in which query words and papers are vertices to find related words. As experiments, we usemore than 400,000 accesses recorded at amajor portal site of Japanese theses, and show that we can find related words with respect to specific disciplines if these words appear frequently. There words seems to be interested in non-researchers and hencewe can’t say they are not related in a usual manner. This result implicates that we can obtain related words if we enrich relationships between technical terminologies using background knowledge, such as dictionaries.

UR - http://www.scopus.com/inward/record.url?scp=84926633842&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84926633842&partnerID=8YFLogxK

U2 - 10.1007/978-3-319-05717-0_4

DO - 10.1007/978-3-319-05717-0_4

M3 - Article

VL - 553

SP - 47

EP - 60

JO - Studies in Computational Intelligence

JF - Studies in Computational Intelligence

SN - 1860-949X

ER -