TY - GEN
T1 - Iterative data squashing for boosting based on a distribution-sensitive distance
AU - Choki, Yuta
AU - Suzuki, Einoshin
N1 - Copyright:
Copyright 2020 Elsevier B.V., All rights reserved.
PY - 2002
Y1 - 2002
N2 - This paper proposes, for boosting, a novel method which prevents deterioration of accuracy inherent to data squashing methods. Boosting, which constructs a highly accurate classification model by combining multiple classification models, requires long computational time. Data squashing, which speeds-up a learning method by abstracting the training data set to a smaller data set, typically lowers accuracy. Our SB (Squashing-Boosting) loop, based on a distribution-sensitive distance, alternates data squashing and boosting, and iteratively refines an SF (Squashed-Feature) tree, which provides an appropriately squashed data set. Experimental evaluation with artificial data sets and the KDD Cup 1999 data set clearly shows superiority of our method compared with conventional methods. We have also empirically evaluated our distance measure as well as our SF tree, and found them superior to alternatives.
AB - This paper proposes, for boosting, a novel method which prevents deterioration of accuracy inherent to data squashing methods. Boosting, which constructs a highly accurate classification model by combining multiple classification models, requires long computational time. Data squashing, which speeds-up a learning method by abstracting the training data set to a smaller data set, typically lowers accuracy. Our SB (Squashing-Boosting) loop, based on a distribution-sensitive distance, alternates data squashing and boosting, and iteratively refines an SF (Squashed-Feature) tree, which provides an appropriately squashed data set. Experimental evaluation with artificial data sets and the KDD Cup 1999 data set clearly shows superiority of our method compared with conventional methods. We have also empirically evaluated our distance measure as well as our SF tree, and found them superior to alternatives.
UR - http://www.scopus.com/inward/record.url?scp=84864841078&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84864841078&partnerID=8YFLogxK
U2 - 10.1007/3-540-45681-3_8
DO - 10.1007/3-540-45681-3_8
M3 - Conference contribution
AN - SCOPUS:84864841078
SN - 3540440372
SN - 9783540440376
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 86
EP - 98
BT - Principles of Data Mining and Knowledge Discovery - 6th European Conference, PKDD 2002, Proceedings
A2 - Elomaa, Tapio
A2 - Mannila, Heikki
A2 - Toivonen, Hannu
PB - Springer Verlag
T2 - 6th European Conference on Principles and Practice of Knowledge Discovery in Databases, PKDD 2002
Y2 - 19 August 2002 through 23 August 2002
ER -