Data analysis by positive decision trees

Kazuhisa Marino, Takashi Suda, Hirotaka Ono, Toshihidc Ibarak

Research output: Contribution to journalArticle

19 Citations (Scopus)

Abstract

Decision trees are used as a convenient means to explain given positive examples and negative examples which is a form of data mining and knowledge discovery. Standard methods such as ID3 may provide non-monotonic decision trees in the sense that data with larger values in all attributes are sometimes classified into a class with a smaller output value. (In the case of binary data this is equivalent to saying that the discriminant Boolean function that the decision tree represents is not positive.) A motivation of this study comes from an observation that real world data are often positive and in such cases it is natural to build decision trees which represent positive (i.e. monotone) discriminant functions. For this we propose how to modify the existing procedures such as IDS so that the resulting decision tree represents a positive discriminant function. In this procedure we add some new data to recover the positivity of data which the original data had but was lost in the process of decomposing data sets by such methods as IDS. To compare the performance of our method with existing methods we test (1) positive data which are randomly generated from a hidden positive Boolean function after adding dummy attributes and (2) breast cancer data as an example of the real-world data. The experimental results on (1) tell that although the sizes of positive decision trees are relatively larger than those without positivity assumption positive decision trees exhibit higher accuracy and tend to choose correct attributes on which the hidden positive Boolean function is denned. For the breast cancer data set we also observe a similar tendency; i.e. positive decision trees are larger but give higher accuracy.

Original languageEnglish
Pages (from-to)76-88
Number of pages13
JournalIEICE Transactions on Information and Systems
VolumeE82-D
Issue number1
Publication statusPublished - Jan 1 1999

Fingerprint

Decision trees
Boolean functions
Data mining

All Science Journal Classification (ASJC) codes

  • Software
  • Hardware and Architecture
  • Computer Vision and Pattern Recognition
  • Electrical and Electronic Engineering
  • Artificial Intelligence

Cite this

Marino, K., Suda, T., Ono, H., & Ibarak, T. (1999). Data analysis by positive decision trees. IEICE Transactions on Information and Systems, E82-D(1), 76-88.

Data analysis by positive decision trees. / Marino, Kazuhisa; Suda, Takashi; Ono, Hirotaka; Ibarak, Toshihidc.

In: IEICE Transactions on Information and Systems, Vol. E82-D, No. 1, 01.01.1999, p. 76-88.

Research output: Contribution to journalArticle

Marino, K, Suda, T, Ono, H & Ibarak, T 1999, 'Data analysis by positive decision trees', IEICE Transactions on Information and Systems, vol. E82-D, no. 1, pp. 76-88.
Marino K, Suda T, Ono H, Ibarak T. Data analysis by positive decision trees. IEICE Transactions on Information and Systems. 1999 Jan 1;E82-D(1):76-88.
Marino, Kazuhisa ; Suda, Takashi ; Ono, Hirotaka ; Ibarak, Toshihidc. / Data analysis by positive decision trees. In: IEICE Transactions on Information and Systems. 1999 ; Vol. E82-D, No. 1. pp. 76-88.
@article{cbfac4a0d4984df6aa3c49cb28dea64b,
title = "Data analysis by positive decision trees",
abstract = "Decision trees are used as a convenient means to explain given positive examples and negative examples which is a form of data mining and knowledge discovery. Standard methods such as ID3 may provide non-monotonic decision trees in the sense that data with larger values in all attributes are sometimes classified into a class with a smaller output value. (In the case of binary data this is equivalent to saying that the discriminant Boolean function that the decision tree represents is not positive.) A motivation of this study comes from an observation that real world data are often positive and in such cases it is natural to build decision trees which represent positive (i.e. monotone) discriminant functions. For this we propose how to modify the existing procedures such as IDS so that the resulting decision tree represents a positive discriminant function. In this procedure we add some new data to recover the positivity of data which the original data had but was lost in the process of decomposing data sets by such methods as IDS. To compare the performance of our method with existing methods we test (1) positive data which are randomly generated from a hidden positive Boolean function after adding dummy attributes and (2) breast cancer data as an example of the real-world data. The experimental results on (1) tell that although the sizes of positive decision trees are relatively larger than those without positivity assumption positive decision trees exhibit higher accuracy and tend to choose correct attributes on which the hidden positive Boolean function is denned. For the breast cancer data set we also observe a similar tendency; i.e. positive decision trees are larger but give higher accuracy.",
author = "Kazuhisa Marino and Takashi Suda and Hirotaka Ono and Toshihidc Ibarak",
year = "1999",
month = "1",
day = "1",
language = "English",
volume = "E82-D",
pages = "76--88",
journal = "IEICE Transactions on Information and Systems",
issn = "0916-8532",
publisher = "一般社団法人電子情報通信学会",
number = "1",

}

TY - JOUR

T1 - Data analysis by positive decision trees

AU - Marino, Kazuhisa

AU - Suda, Takashi

AU - Ono, Hirotaka

AU - Ibarak, Toshihidc

PY - 1999/1/1

Y1 - 1999/1/1

N2 - Decision trees are used as a convenient means to explain given positive examples and negative examples which is a form of data mining and knowledge discovery. Standard methods such as ID3 may provide non-monotonic decision trees in the sense that data with larger values in all attributes are sometimes classified into a class with a smaller output value. (In the case of binary data this is equivalent to saying that the discriminant Boolean function that the decision tree represents is not positive.) A motivation of this study comes from an observation that real world data are often positive and in such cases it is natural to build decision trees which represent positive (i.e. monotone) discriminant functions. For this we propose how to modify the existing procedures such as IDS so that the resulting decision tree represents a positive discriminant function. In this procedure we add some new data to recover the positivity of data which the original data had but was lost in the process of decomposing data sets by such methods as IDS. To compare the performance of our method with existing methods we test (1) positive data which are randomly generated from a hidden positive Boolean function after adding dummy attributes and (2) breast cancer data as an example of the real-world data. The experimental results on (1) tell that although the sizes of positive decision trees are relatively larger than those without positivity assumption positive decision trees exhibit higher accuracy and tend to choose correct attributes on which the hidden positive Boolean function is denned. For the breast cancer data set we also observe a similar tendency; i.e. positive decision trees are larger but give higher accuracy.

AB - Decision trees are used as a convenient means to explain given positive examples and negative examples which is a form of data mining and knowledge discovery. Standard methods such as ID3 may provide non-monotonic decision trees in the sense that data with larger values in all attributes are sometimes classified into a class with a smaller output value. (In the case of binary data this is equivalent to saying that the discriminant Boolean function that the decision tree represents is not positive.) A motivation of this study comes from an observation that real world data are often positive and in such cases it is natural to build decision trees which represent positive (i.e. monotone) discriminant functions. For this we propose how to modify the existing procedures such as IDS so that the resulting decision tree represents a positive discriminant function. In this procedure we add some new data to recover the positivity of data which the original data had but was lost in the process of decomposing data sets by such methods as IDS. To compare the performance of our method with existing methods we test (1) positive data which are randomly generated from a hidden positive Boolean function after adding dummy attributes and (2) breast cancer data as an example of the real-world data. The experimental results on (1) tell that although the sizes of positive decision trees are relatively larger than those without positivity assumption positive decision trees exhibit higher accuracy and tend to choose correct attributes on which the hidden positive Boolean function is denned. For the breast cancer data set we also observe a similar tendency; i.e. positive decision trees are larger but give higher accuracy.

UR - http://www.scopus.com/inward/record.url?scp=0033312334&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0033312334&partnerID=8YFLogxK

M3 - Article

AN - SCOPUS:0033312334

VL - E82-D

SP - 76

EP - 88

JO - IEICE Transactions on Information and Systems

JF - IEICE Transactions on Information and Systems

SN - 0916-8532

IS - 1

ER -