Challenges in classifying privacy policies by machine learning with word-based features

Keishiro Fukushima, Daisuke Ikeda, Toru Nakamura, Shinsaku Kiyomoto

研究成果: 著書/レポートタイプへの貢献会議での発言

抄録

In this paper, we discuss challenges when we try to automatically classify privacy policies using machine learning with words as the features. Since it is difficult for general public to understand privacy policies, it is necessary to support them to do that. To this end, the authors believe that machine learning is one of the promising ways because users can grasp the meaning of policies through outputs by a machine learning algorithm. Our final goal is to develop a system which automatically translates privacy policies into privacy labels [1]. Toward this goal, we classify sentences in privacy policies with category labels, using popular machine learning algorithms, such as a naive Bayes classifier. We choose these algorithms because we could use trained classifiers to evaluate keywords appropriate for privacy labels. Therefore, we adopt words as the features of those algorithms. Experimental results show about 85% accuracy. We think that much higher accuracy is necessary to achieve our final goal. By changing learning settings, we identified one reason of low accuracies such that privacy policies include many sentences which are not direct description of information about categories. It seems that such sentences are redundant but maybe they are essential in case of legal documents in order to prevent misinterpreting. Thus, it is important for machine learning algorithms to handle these redundant sentences appropriately.

元の言語英語
ホスト出版物のタイトルProceedings of 2018 the 2nd International Conference on Cryptography, Security and Privacy, ICCSP 2018
出版者Association for Computing Machinery
ページ62-66
ページ数5
ISBN(電子版)9781450363617
DOI
出版物ステータス出版済み - 3 16 2018
イベント2nd International Conference on Cryptography, Security and Privacy, ICCSP 2018 - Guiyang, 中国
継続期間: 3 16 20183 18 2018

出版物シリーズ

名前ACM International Conference Proceeding Series

その他

その他2nd International Conference on Cryptography, Security and Privacy, ICCSP 2018
中国
Guiyang
期間3/16/183/18/18

Fingerprint

Learning systems
Learning algorithms
Labels
Classifiers

All Science Journal Classification (ASJC) codes

  • Human-Computer Interaction
  • Computer Networks and Communications
  • Computer Vision and Pattern Recognition
  • Software

これを引用

Fukushima, K., Ikeda, D., Nakamura, T., & Kiyomoto, S. (2018). Challenges in classifying privacy policies by machine learning with word-based features. : Proceedings of 2018 the 2nd International Conference on Cryptography, Security and Privacy, ICCSP 2018 (pp. 62-66). (ACM International Conference Proceeding Series). Association for Computing Machinery. https://doi.org/10.1145/3199478.3199486

Challenges in classifying privacy policies by machine learning with word-based features. / Fukushima, Keishiro; Ikeda, Daisuke; Nakamura, Toru; Kiyomoto, Shinsaku.

Proceedings of 2018 the 2nd International Conference on Cryptography, Security and Privacy, ICCSP 2018. Association for Computing Machinery, 2018. p. 62-66 (ACM International Conference Proceeding Series).

研究成果: 著書/レポートタイプへの貢献会議での発言

Fukushima, K, Ikeda, D, Nakamura, T & Kiyomoto, S 2018, Challenges in classifying privacy policies by machine learning with word-based features. : Proceedings of 2018 the 2nd International Conference on Cryptography, Security and Privacy, ICCSP 2018. ACM International Conference Proceeding Series, Association for Computing Machinery, pp. 62-66, 2nd International Conference on Cryptography, Security and Privacy, ICCSP 2018, Guiyang, 中国, 3/16/18. https://doi.org/10.1145/3199478.3199486
Fukushima K, Ikeda D, Nakamura T, Kiyomoto S. Challenges in classifying privacy policies by machine learning with word-based features. : Proceedings of 2018 the 2nd International Conference on Cryptography, Security and Privacy, ICCSP 2018. Association for Computing Machinery. 2018. p. 62-66. (ACM International Conference Proceeding Series). https://doi.org/10.1145/3199478.3199486
Fukushima, Keishiro ; Ikeda, Daisuke ; Nakamura, Toru ; Kiyomoto, Shinsaku. / Challenges in classifying privacy policies by machine learning with word-based features. Proceedings of 2018 the 2nd International Conference on Cryptography, Security and Privacy, ICCSP 2018. Association for Computing Machinery, 2018. pp. 62-66 (ACM International Conference Proceeding Series).
@inproceedings{e1d50edb72864bbf9dea338dabd95ba5,
title = "Challenges in classifying privacy policies by machine learning with word-based features",
abstract = "In this paper, we discuss challenges when we try to automatically classify privacy policies using machine learning with words as the features. Since it is difficult for general public to understand privacy policies, it is necessary to support them to do that. To this end, the authors believe that machine learning is one of the promising ways because users can grasp the meaning of policies through outputs by a machine learning algorithm. Our final goal is to develop a system which automatically translates privacy policies into privacy labels [1]. Toward this goal, we classify sentences in privacy policies with category labels, using popular machine learning algorithms, such as a naive Bayes classifier. We choose these algorithms because we could use trained classifiers to evaluate keywords appropriate for privacy labels. Therefore, we adopt words as the features of those algorithms. Experimental results show about 85{\%} accuracy. We think that much higher accuracy is necessary to achieve our final goal. By changing learning settings, we identified one reason of low accuracies such that privacy policies include many sentences which are not direct description of information about categories. It seems that such sentences are redundant but maybe they are essential in case of legal documents in order to prevent misinterpreting. Thus, it is important for machine learning algorithms to handle these redundant sentences appropriately.",
author = "Keishiro Fukushima and Daisuke Ikeda and Toru Nakamura and Shinsaku Kiyomoto",
year = "2018",
month = "3",
day = "16",
doi = "10.1145/3199478.3199486",
language = "English",
series = "ACM International Conference Proceeding Series",
publisher = "Association for Computing Machinery",
pages = "62--66",
booktitle = "Proceedings of 2018 the 2nd International Conference on Cryptography, Security and Privacy, ICCSP 2018",

}

TY - GEN

T1 - Challenges in classifying privacy policies by machine learning with word-based features

AU - Fukushima, Keishiro

AU - Ikeda, Daisuke

AU - Nakamura, Toru

AU - Kiyomoto, Shinsaku

PY - 2018/3/16

Y1 - 2018/3/16

N2 - In this paper, we discuss challenges when we try to automatically classify privacy policies using machine learning with words as the features. Since it is difficult for general public to understand privacy policies, it is necessary to support them to do that. To this end, the authors believe that machine learning is one of the promising ways because users can grasp the meaning of policies through outputs by a machine learning algorithm. Our final goal is to develop a system which automatically translates privacy policies into privacy labels [1]. Toward this goal, we classify sentences in privacy policies with category labels, using popular machine learning algorithms, such as a naive Bayes classifier. We choose these algorithms because we could use trained classifiers to evaluate keywords appropriate for privacy labels. Therefore, we adopt words as the features of those algorithms. Experimental results show about 85% accuracy. We think that much higher accuracy is necessary to achieve our final goal. By changing learning settings, we identified one reason of low accuracies such that privacy policies include many sentences which are not direct description of information about categories. It seems that such sentences are redundant but maybe they are essential in case of legal documents in order to prevent misinterpreting. Thus, it is important for machine learning algorithms to handle these redundant sentences appropriately.

AB - In this paper, we discuss challenges when we try to automatically classify privacy policies using machine learning with words as the features. Since it is difficult for general public to understand privacy policies, it is necessary to support them to do that. To this end, the authors believe that machine learning is one of the promising ways because users can grasp the meaning of policies through outputs by a machine learning algorithm. Our final goal is to develop a system which automatically translates privacy policies into privacy labels [1]. Toward this goal, we classify sentences in privacy policies with category labels, using popular machine learning algorithms, such as a naive Bayes classifier. We choose these algorithms because we could use trained classifiers to evaluate keywords appropriate for privacy labels. Therefore, we adopt words as the features of those algorithms. Experimental results show about 85% accuracy. We think that much higher accuracy is necessary to achieve our final goal. By changing learning settings, we identified one reason of low accuracies such that privacy policies include many sentences which are not direct description of information about categories. It seems that such sentences are redundant but maybe they are essential in case of legal documents in order to prevent misinterpreting. Thus, it is important for machine learning algorithms to handle these redundant sentences appropriately.

UR - http://www.scopus.com/inward/record.url?scp=85052022874&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85052022874&partnerID=8YFLogxK

U2 - 10.1145/3199478.3199486

DO - 10.1145/3199478.3199486

M3 - Conference contribution

AN - SCOPUS:85052022874

T3 - ACM International Conference Proceeding Series

SP - 62

EP - 66

BT - Proceedings of 2018 the 2nd International Conference on Cryptography, Security and Privacy, ICCSP 2018

PB - Association for Computing Machinery

ER -