We propose a new approach to large-scale machine learning, learning over compressed data: First compress the training data somehow and then employ various machine learning algorithms on the compressed data, with the hope that the computation time is significantly reduced when the training data is well compressed. As a first step toward this approach, we consider a variant of the Zero-Suppressed Binary Decision Diagram (ZDD) as the data structure for representing the training data, which is a generalization of the ZDD by incorporating non-determinism. For the learning algorithm to be employed, we consider a boosting algorithm called AdaBoost⁎ and its precursor AdaBoost. In this paper, we give efficient implementations of the boosting algorithms whose running times (per iteration) are linear in the size of the given ZDD.
All Science Journal Classification (ASJC) codes
- Theoretical Computer Science
- Computer Science(all)