Identification of Unnatural Subsets in Statistical Data

Takahiko Suzuki, Tssukasa Kamimasu, Tetsuya Nakatoh, Sachio Hirokawa

研究成果: 著書/レポートタイプへの貢献会議での発言

抄録

Benford's law is an observation on the frequency distribution of first significant digits in natural numerical data. We can measure the unnaturalness of the data by evaluating estrangement of the frequency distribution of leading digits of the data in relation to the Benford's distribution. However, we cannot identify the unnatural part of the data precisely. In this study, we focus on the fact that statistical data is generally provided in tabular form. We specify a subset of the target data by using the item names of rows and columns that define each cell of the table or words appearing in the table title. By measuring the degree of divergence of the subset from Benford's distribution, we can identify unnatural subsets. We apply this method to agriculture-related data from China Statistical Yearbook and succeeded to identify unnatural subsets.

元の言語英語
ホスト出版物のタイトルProceedings - 2018 7th International Congress on Advanced Applied Informatics, IIAI-AAI 2018
出版者Institute of Electrical and Electronics Engineers Inc.
ページ74-80
ページ数7
ISBN(電子版)9781538674475
DOI
出版物ステータス出版済み - 7 2 2018
イベント7th International Congress on Advanced Applied Informatics, IIAI-AAI 2018 - Yonago, 日本
継続期間: 7 8 20187 13 2018

出版物シリーズ

名前Proceedings - 2018 7th International Congress on Advanced Applied Informatics, IIAI-AAI 2018

会議

会議7th International Congress on Advanced Applied Informatics, IIAI-AAI 2018
日本
Yonago
期間7/8/187/13/18

Fingerprint

Agriculture
frequency distribution
divergence
agriculture
China
Law

All Science Journal Classification (ASJC) codes

  • Computer Networks and Communications
  • Communication
  • Information Systems
  • Information Systems and Management
  • Education

これを引用

Suzuki, T., Kamimasu, T., Nakatoh, T., & Hirokawa, S. (2018). Identification of Unnatural Subsets in Statistical Data. : Proceedings - 2018 7th International Congress on Advanced Applied Informatics, IIAI-AAI 2018 (pp. 74-80). [8693375] (Proceedings - 2018 7th International Congress on Advanced Applied Informatics, IIAI-AAI 2018). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/IIAI-AAI.2018.00024

Identification of Unnatural Subsets in Statistical Data. / Suzuki, Takahiko; Kamimasu, Tssukasa; Nakatoh, Tetsuya; Hirokawa, Sachio.

Proceedings - 2018 7th International Congress on Advanced Applied Informatics, IIAI-AAI 2018. Institute of Electrical and Electronics Engineers Inc., 2018. p. 74-80 8693375 (Proceedings - 2018 7th International Congress on Advanced Applied Informatics, IIAI-AAI 2018).

研究成果: 著書/レポートタイプへの貢献会議での発言

Suzuki, T, Kamimasu, T, Nakatoh, T & Hirokawa, S 2018, Identification of Unnatural Subsets in Statistical Data. : Proceedings - 2018 7th International Congress on Advanced Applied Informatics, IIAI-AAI 2018., 8693375, Proceedings - 2018 7th International Congress on Advanced Applied Informatics, IIAI-AAI 2018, Institute of Electrical and Electronics Engineers Inc., pp. 74-80, 7th International Congress on Advanced Applied Informatics, IIAI-AAI 2018, Yonago, 日本, 7/8/18. https://doi.org/10.1109/IIAI-AAI.2018.00024
Suzuki T, Kamimasu T, Nakatoh T, Hirokawa S. Identification of Unnatural Subsets in Statistical Data. : Proceedings - 2018 7th International Congress on Advanced Applied Informatics, IIAI-AAI 2018. Institute of Electrical and Electronics Engineers Inc. 2018. p. 74-80. 8693375. (Proceedings - 2018 7th International Congress on Advanced Applied Informatics, IIAI-AAI 2018). https://doi.org/10.1109/IIAI-AAI.2018.00024
Suzuki, Takahiko ; Kamimasu, Tssukasa ; Nakatoh, Tetsuya ; Hirokawa, Sachio. / Identification of Unnatural Subsets in Statistical Data. Proceedings - 2018 7th International Congress on Advanced Applied Informatics, IIAI-AAI 2018. Institute of Electrical and Electronics Engineers Inc., 2018. pp. 74-80 (Proceedings - 2018 7th International Congress on Advanced Applied Informatics, IIAI-AAI 2018).
@inproceedings{fc2114ab94214d4d912a0124a1b4a0dc,
title = "Identification of Unnatural Subsets in Statistical Data",
abstract = "Benford's law is an observation on the frequency distribution of first significant digits in natural numerical data. We can measure the unnaturalness of the data by evaluating estrangement of the frequency distribution of leading digits of the data in relation to the Benford's distribution. However, we cannot identify the unnatural part of the data precisely. In this study, we focus on the fact that statistical data is generally provided in tabular form. We specify a subset of the target data by using the item names of rows and columns that define each cell of the table or words appearing in the table title. By measuring the degree of divergence of the subset from Benford's distribution, we can identify unnatural subsets. We apply this method to agriculture-related data from China Statistical Yearbook and succeeded to identify unnatural subsets.",
author = "Takahiko Suzuki and Tssukasa Kamimasu and Tetsuya Nakatoh and Sachio Hirokawa",
year = "2018",
month = "7",
day = "2",
doi = "10.1109/IIAI-AAI.2018.00024",
language = "English",
series = "Proceedings - 2018 7th International Congress on Advanced Applied Informatics, IIAI-AAI 2018",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
pages = "74--80",
booktitle = "Proceedings - 2018 7th International Congress on Advanced Applied Informatics, IIAI-AAI 2018",
address = "United States",

}

TY - GEN

T1 - Identification of Unnatural Subsets in Statistical Data

AU - Suzuki, Takahiko

AU - Kamimasu, Tssukasa

AU - Nakatoh, Tetsuya

AU - Hirokawa, Sachio

PY - 2018/7/2

Y1 - 2018/7/2

N2 - Benford's law is an observation on the frequency distribution of first significant digits in natural numerical data. We can measure the unnaturalness of the data by evaluating estrangement of the frequency distribution of leading digits of the data in relation to the Benford's distribution. However, we cannot identify the unnatural part of the data precisely. In this study, we focus on the fact that statistical data is generally provided in tabular form. We specify a subset of the target data by using the item names of rows and columns that define each cell of the table or words appearing in the table title. By measuring the degree of divergence of the subset from Benford's distribution, we can identify unnatural subsets. We apply this method to agriculture-related data from China Statistical Yearbook and succeeded to identify unnatural subsets.

AB - Benford's law is an observation on the frequency distribution of first significant digits in natural numerical data. We can measure the unnaturalness of the data by evaluating estrangement of the frequency distribution of leading digits of the data in relation to the Benford's distribution. However, we cannot identify the unnatural part of the data precisely. In this study, we focus on the fact that statistical data is generally provided in tabular form. We specify a subset of the target data by using the item names of rows and columns that define each cell of the table or words appearing in the table title. By measuring the degree of divergence of the subset from Benford's distribution, we can identify unnatural subsets. We apply this method to agriculture-related data from China Statistical Yearbook and succeeded to identify unnatural subsets.

UR - http://www.scopus.com/inward/record.url?scp=85065195514&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85065195514&partnerID=8YFLogxK

U2 - 10.1109/IIAI-AAI.2018.00024

DO - 10.1109/IIAI-AAI.2018.00024

M3 - Conference contribution

AN - SCOPUS:85065195514

T3 - Proceedings - 2018 7th International Congress on Advanced Applied Informatics, IIAI-AAI 2018

SP - 74

EP - 80

BT - Proceedings - 2018 7th International Congress on Advanced Applied Informatics, IIAI-AAI 2018

PB - Institute of Electrical and Electronics Engineers Inc.

ER -