TY - GEN
T1 - Tag recommendation for open government data by multi-label classification and particular noun phrase extraction
AU - Yamada, Yasuhiro
AU - Nakatoh, Tetsuya
N1 - Funding Information:
This work was partially supported by JSPS KA-KENHI Grant Numbers 15K00426.
PY - 2018
Y1 - 2018
N2 - Open government data (OGD) is statistical data made and published by governments. Administrators often give tags to the metadata of OGD. Tags, which are a collection of a single word or multiple words, express the data. Tags are useful to understand the data without actually reading the data and also to search for OGD. However, administrators have to understand the data in detail in order to assign tags. We take two different approaches for giving appropriate tags to OGD. First, we use a multi-label classification technique to give tags to OGD from tags in the training data. Second, we extract particular noun phrases from the metadata of OGD by calculating the difference between the frequency of a noun phrase and the frequencies of single words within the noun phrase. Experiments using 196,587 datasets on Data.gov show that the accuracy of prediction by the multi-label classification method is enough to develop a tag recommendation system. Also, the experiments show that our extraction method of particular noun phrases extracts some infrequent tags of the datasets.
AB - Open government data (OGD) is statistical data made and published by governments. Administrators often give tags to the metadata of OGD. Tags, which are a collection of a single word or multiple words, express the data. Tags are useful to understand the data without actually reading the data and also to search for OGD. However, administrators have to understand the data in detail in order to assign tags. We take two different approaches for giving appropriate tags to OGD. First, we use a multi-label classification technique to give tags to OGD from tags in the training data. Second, we extract particular noun phrases from the metadata of OGD by calculating the difference between the frequency of a noun phrase and the frequencies of single words within the noun phrase. Experiments using 196,587 datasets on Data.gov show that the accuracy of prediction by the multi-label classification method is enough to develop a tag recommendation system. Also, the experiments show that our extraction method of particular noun phrases extracts some infrequent tags of the datasets.
UR - http://www.scopus.com/inward/record.url?scp=85059091600&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85059091600&partnerID=8YFLogxK
U2 - 10.5220/0006937800830091
DO - 10.5220/0006937800830091
M3 - Conference contribution
AN - SCOPUS:85059091600
T3 - IC3K 2018 - Proceedings of the 10th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management
SP - 83
EP - 91
BT - IC3K 2018 - Proceedings of the 10th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management
A2 - Bernardino, Jorge
A2 - Salgado, Ana Carolina
A2 - Filipe, Joaquim
PB - SciTePress
T2 - 10th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management, IC3K 2018
Y2 - 18 September 2018 through 20 September 2018
ER -