Iterative data squashing for boosting based on a distribution-sensitive distance

Yuta Choki, Einoshin Suzuki

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This paper proposes, for boosting, a novel method which prevents deterioration of accuracy inherent to data squashing methods. Boosting, which constructs a highly accurate classification model by combining multiple classification models, requires long computational time. Data squashing, which speeds-up a learning method by abstracting the training data set to a smaller data set, typically lowers accuracy. Our SB (Squashing-Boosting) loop, based on a distribution-sensitive distance, alternates data squashing and boosting, and iteratively refines an SF (Squashed-Feature) tree, which provides an appropriately squashed data set. Experimental evaluation with artificial data sets and the KDD Cup 1999 data set clearly shows superiority of our method compared with conventional methods. We have also empirically evaluated our distance measure as well as our SF tree, and found them superior to alternatives.

Original languageEnglish
Title of host publicationPrinciples of Data Mining and Knowledge Discovery - 6th European Conference, PKDD 2002, Proceedings
Pages86-98
Number of pages13
Publication statusPublished - Dec 1 2002
Event6th European Conference on Principles and Practice of Knowledge Discovery in Databases, PKDD 2002 - Helsinki, Finland
Duration: Aug 19 2002Aug 23 2002

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume2431 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other6th European Conference on Principles and Practice of Knowledge Discovery in Databases, PKDD 2002
CountryFinland
CityHelsinki
Period8/19/028/23/02

Fingerprint

Boosting
Deterioration
Distance Measure
Experimental Evaluation
Alternate
Speedup
Alternatives
Model

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Computer Science(all)

Cite this

Choki, Y., & Suzuki, E. (2002). Iterative data squashing for boosting based on a distribution-sensitive distance. In Principles of Data Mining and Knowledge Discovery - 6th European Conference, PKDD 2002, Proceedings (pp. 86-98). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 2431 LNAI).

Iterative data squashing for boosting based on a distribution-sensitive distance. / Choki, Yuta; Suzuki, Einoshin.

Principles of Data Mining and Knowledge Discovery - 6th European Conference, PKDD 2002, Proceedings. 2002. p. 86-98 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 2431 LNAI).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Choki, Y & Suzuki, E 2002, Iterative data squashing for boosting based on a distribution-sensitive distance. in Principles of Data Mining and Knowledge Discovery - 6th European Conference, PKDD 2002, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 2431 LNAI, pp. 86-98, 6th European Conference on Principles and Practice of Knowledge Discovery in Databases, PKDD 2002, Helsinki, Finland, 8/19/02.
Choki Y, Suzuki E. Iterative data squashing for boosting based on a distribution-sensitive distance. In Principles of Data Mining and Knowledge Discovery - 6th European Conference, PKDD 2002, Proceedings. 2002. p. 86-98. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
Choki, Yuta ; Suzuki, Einoshin. / Iterative data squashing for boosting based on a distribution-sensitive distance. Principles of Data Mining and Knowledge Discovery - 6th European Conference, PKDD 2002, Proceedings. 2002. pp. 86-98 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{5ab44c9b0933418a9927b2844b041cc4,
title = "Iterative data squashing for boosting based on a distribution-sensitive distance",
abstract = "This paper proposes, for boosting, a novel method which prevents deterioration of accuracy inherent to data squashing methods. Boosting, which constructs a highly accurate classification model by combining multiple classification models, requires long computational time. Data squashing, which speeds-up a learning method by abstracting the training data set to a smaller data set, typically lowers accuracy. Our SB (Squashing-Boosting) loop, based on a distribution-sensitive distance, alternates data squashing and boosting, and iteratively refines an SF (Squashed-Feature) tree, which provides an appropriately squashed data set. Experimental evaluation with artificial data sets and the KDD Cup 1999 data set clearly shows superiority of our method compared with conventional methods. We have also empirically evaluated our distance measure as well as our SF tree, and found them superior to alternatives.",
author = "Yuta Choki and Einoshin Suzuki",
year = "2002",
month = "12",
day = "1",
language = "English",
isbn = "3540440372",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
pages = "86--98",
booktitle = "Principles of Data Mining and Knowledge Discovery - 6th European Conference, PKDD 2002, Proceedings",

}

TY - GEN

T1 - Iterative data squashing for boosting based on a distribution-sensitive distance

AU - Choki, Yuta

AU - Suzuki, Einoshin

PY - 2002/12/1

Y1 - 2002/12/1

N2 - This paper proposes, for boosting, a novel method which prevents deterioration of accuracy inherent to data squashing methods. Boosting, which constructs a highly accurate classification model by combining multiple classification models, requires long computational time. Data squashing, which speeds-up a learning method by abstracting the training data set to a smaller data set, typically lowers accuracy. Our SB (Squashing-Boosting) loop, based on a distribution-sensitive distance, alternates data squashing and boosting, and iteratively refines an SF (Squashed-Feature) tree, which provides an appropriately squashed data set. Experimental evaluation with artificial data sets and the KDD Cup 1999 data set clearly shows superiority of our method compared with conventional methods. We have also empirically evaluated our distance measure as well as our SF tree, and found them superior to alternatives.

AB - This paper proposes, for boosting, a novel method which prevents deterioration of accuracy inherent to data squashing methods. Boosting, which constructs a highly accurate classification model by combining multiple classification models, requires long computational time. Data squashing, which speeds-up a learning method by abstracting the training data set to a smaller data set, typically lowers accuracy. Our SB (Squashing-Boosting) loop, based on a distribution-sensitive distance, alternates data squashing and boosting, and iteratively refines an SF (Squashed-Feature) tree, which provides an appropriately squashed data set. Experimental evaluation with artificial data sets and the KDD Cup 1999 data set clearly shows superiority of our method compared with conventional methods. We have also empirically evaluated our distance measure as well as our SF tree, and found them superior to alternatives.

UR - http://www.scopus.com/inward/record.url?scp=84864841078&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84864841078&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:84864841078

SN - 3540440372

SN - 9783540440376

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 86

EP - 98

BT - Principles of Data Mining and Knowledge Discovery - 6th European Conference, PKDD 2002, Proceedings

ER -