Identification of Unnatural Subsets in Statistical Data

Takahiko Suzuki, Tssukasa Kamimasu, Tetsuya Nakatoh, Sachio Hirokawa

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Benford's law is an observation on the frequency distribution of first significant digits in natural numerical data. We can measure the unnaturalness of the data by evaluating estrangement of the frequency distribution of leading digits of the data in relation to the Benford's distribution. However, we cannot identify the unnatural part of the data precisely. In this study, we focus on the fact that statistical data is generally provided in tabular form. We specify a subset of the target data by using the item names of rows and columns that define each cell of the table or words appearing in the table title. By measuring the degree of divergence of the subset from Benford's distribution, we can identify unnatural subsets. We apply this method to agriculture-related data from China Statistical Yearbook and succeeded to identify unnatural subsets.

Original languageEnglish
Title of host publicationProceedings - 2018 7th International Congress on Advanced Applied Informatics, IIAI-AAI 2018
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages74-80
Number of pages7
ISBN (Electronic)9781538674475
DOIs
Publication statusPublished - Jul 2 2018
Event7th International Congress on Advanced Applied Informatics, IIAI-AAI 2018 - Yonago, Japan
Duration: Jul 8 2018Jul 13 2018

Publication series

NameProceedings - 2018 7th International Congress on Advanced Applied Informatics, IIAI-AAI 2018

Conference

Conference7th International Congress on Advanced Applied Informatics, IIAI-AAI 2018
CountryJapan
CityYonago
Period7/8/187/13/18

Fingerprint

Agriculture
frequency distribution
divergence
agriculture
China
Law

All Science Journal Classification (ASJC) codes

  • Computer Networks and Communications
  • Communication
  • Information Systems
  • Information Systems and Management
  • Education

Cite this

Suzuki, T., Kamimasu, T., Nakatoh, T., & Hirokawa, S. (2018). Identification of Unnatural Subsets in Statistical Data. In Proceedings - 2018 7th International Congress on Advanced Applied Informatics, IIAI-AAI 2018 (pp. 74-80). [8693375] (Proceedings - 2018 7th International Congress on Advanced Applied Informatics, IIAI-AAI 2018). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/IIAI-AAI.2018.00024

Identification of Unnatural Subsets in Statistical Data. / Suzuki, Takahiko; Kamimasu, Tssukasa; Nakatoh, Tetsuya; Hirokawa, Sachio.

Proceedings - 2018 7th International Congress on Advanced Applied Informatics, IIAI-AAI 2018. Institute of Electrical and Electronics Engineers Inc., 2018. p. 74-80 8693375 (Proceedings - 2018 7th International Congress on Advanced Applied Informatics, IIAI-AAI 2018).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Suzuki, T, Kamimasu, T, Nakatoh, T & Hirokawa, S 2018, Identification of Unnatural Subsets in Statistical Data. in Proceedings - 2018 7th International Congress on Advanced Applied Informatics, IIAI-AAI 2018., 8693375, Proceedings - 2018 7th International Congress on Advanced Applied Informatics, IIAI-AAI 2018, Institute of Electrical and Electronics Engineers Inc., pp. 74-80, 7th International Congress on Advanced Applied Informatics, IIAI-AAI 2018, Yonago, Japan, 7/8/18. https://doi.org/10.1109/IIAI-AAI.2018.00024
Suzuki T, Kamimasu T, Nakatoh T, Hirokawa S. Identification of Unnatural Subsets in Statistical Data. In Proceedings - 2018 7th International Congress on Advanced Applied Informatics, IIAI-AAI 2018. Institute of Electrical and Electronics Engineers Inc. 2018. p. 74-80. 8693375. (Proceedings - 2018 7th International Congress on Advanced Applied Informatics, IIAI-AAI 2018). https://doi.org/10.1109/IIAI-AAI.2018.00024
Suzuki, Takahiko ; Kamimasu, Tssukasa ; Nakatoh, Tetsuya ; Hirokawa, Sachio. / Identification of Unnatural Subsets in Statistical Data. Proceedings - 2018 7th International Congress on Advanced Applied Informatics, IIAI-AAI 2018. Institute of Electrical and Electronics Engineers Inc., 2018. pp. 74-80 (Proceedings - 2018 7th International Congress on Advanced Applied Informatics, IIAI-AAI 2018).
@inproceedings{fc2114ab94214d4d912a0124a1b4a0dc,
title = "Identification of Unnatural Subsets in Statistical Data",
abstract = "Benford's law is an observation on the frequency distribution of first significant digits in natural numerical data. We can measure the unnaturalness of the data by evaluating estrangement of the frequency distribution of leading digits of the data in relation to the Benford's distribution. However, we cannot identify the unnatural part of the data precisely. In this study, we focus on the fact that statistical data is generally provided in tabular form. We specify a subset of the target data by using the item names of rows and columns that define each cell of the table or words appearing in the table title. By measuring the degree of divergence of the subset from Benford's distribution, we can identify unnatural subsets. We apply this method to agriculture-related data from China Statistical Yearbook and succeeded to identify unnatural subsets.",
author = "Takahiko Suzuki and Tssukasa Kamimasu and Tetsuya Nakatoh and Sachio Hirokawa",
year = "2018",
month = "7",
day = "2",
doi = "10.1109/IIAI-AAI.2018.00024",
language = "English",
series = "Proceedings - 2018 7th International Congress on Advanced Applied Informatics, IIAI-AAI 2018",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
pages = "74--80",
booktitle = "Proceedings - 2018 7th International Congress on Advanced Applied Informatics, IIAI-AAI 2018",
address = "United States",

}

TY - GEN

T1 - Identification of Unnatural Subsets in Statistical Data

AU - Suzuki, Takahiko

AU - Kamimasu, Tssukasa

AU - Nakatoh, Tetsuya

AU - Hirokawa, Sachio

PY - 2018/7/2

Y1 - 2018/7/2

N2 - Benford's law is an observation on the frequency distribution of first significant digits in natural numerical data. We can measure the unnaturalness of the data by evaluating estrangement of the frequency distribution of leading digits of the data in relation to the Benford's distribution. However, we cannot identify the unnatural part of the data precisely. In this study, we focus on the fact that statistical data is generally provided in tabular form. We specify a subset of the target data by using the item names of rows and columns that define each cell of the table or words appearing in the table title. By measuring the degree of divergence of the subset from Benford's distribution, we can identify unnatural subsets. We apply this method to agriculture-related data from China Statistical Yearbook and succeeded to identify unnatural subsets.

AB - Benford's law is an observation on the frequency distribution of first significant digits in natural numerical data. We can measure the unnaturalness of the data by evaluating estrangement of the frequency distribution of leading digits of the data in relation to the Benford's distribution. However, we cannot identify the unnatural part of the data precisely. In this study, we focus on the fact that statistical data is generally provided in tabular form. We specify a subset of the target data by using the item names of rows and columns that define each cell of the table or words appearing in the table title. By measuring the degree of divergence of the subset from Benford's distribution, we can identify unnatural subsets. We apply this method to agriculture-related data from China Statistical Yearbook and succeeded to identify unnatural subsets.

UR - http://www.scopus.com/inward/record.url?scp=85065195514&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85065195514&partnerID=8YFLogxK

U2 - 10.1109/IIAI-AAI.2018.00024

DO - 10.1109/IIAI-AAI.2018.00024

M3 - Conference contribution

AN - SCOPUS:85065195514

T3 - Proceedings - 2018 7th International Congress on Advanced Applied Informatics, IIAI-AAI 2018

SP - 74

EP - 80

BT - Proceedings - 2018 7th International Congress on Advanced Applied Informatics, IIAI-AAI 2018

PB - Institute of Electrical and Electronics Engineers Inc.

ER -