Tag recommendation for open government data by multi-label classification and particular noun phrase extraction

Yasuhiro Yamada, Tetsuya Nakatoh

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    Open government data (OGD) is statistical data made and published by governments. Administrators often give tags to the metadata of OGD. Tags, which are a collection of a single word or multiple words, express the data. Tags are useful to understand the data without actually reading the data and also to search for OGD. However, administrators have to understand the data in detail in order to assign tags. We take two different approaches for giving appropriate tags to OGD. First, we use a multi-label classification technique to give tags to OGD from tags in the training data. Second, we extract particular noun phrases from the metadata of OGD by calculating the difference between the frequency of a noun phrase and the frequencies of single words within the noun phrase. Experiments using 196,587 datasets on Data.gov show that the accuracy of prediction by the multi-label classification method is enough to develop a tag recommendation system. Also, the experiments show that our extraction method of particular noun phrases extracts some infrequent tags of the datasets.

    Original languageEnglish
    Title of host publicationIC3K 2018 - Proceedings of the 10th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management
    EditorsJorge Bernardino, Ana Carolina Salgado, Joaquim Filipe
    PublisherSciTePress
    Pages83-91
    Number of pages9
    ISBN (Electronic)9789897583308
    DOIs
    Publication statusPublished - 2018
    Event10th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management, IC3K 2018 - Seville, Spain
    Duration: Sep 18 2018Sep 20 2018

    Publication series

    NameIC3K 2018 - Proceedings of the 10th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management
    Volume3

    Other

    Other10th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management, IC3K 2018
    CountrySpain
    CitySeville
    Period9/18/189/20/18

    All Science Journal Classification (ASJC) codes

    • Software

    Fingerprint Dive into the research topics of 'Tag recommendation for open government data by multi-label classification and particular noun phrase extraction'. Together they form a unique fingerprint.

    Cite this