Scalable partial least squares regression on grammar-compressed data matrices

Yasuo Tabei, Hiroto Saigo, Yoshihiro Yamanishi, Simon J. Puglisi

研究成果: 著書/レポートタイプへの貢献会議での発言

7 引用 (Scopus)

抄録

With massive high-dimensional data now commonplace in research and industry, there is a strong and growing demand for more scalable computational techniques for data analysis and knowledge discovery. Key to turning these data into knowledge is the ability to learn statistical models with high interpretability. Current methods for learning statistical models either produce models that are not interpretable or have prohibitive computational costs when applied to massive data. In this paper we address this need by presenting a scalable algorithm for partial least squares regression (PLS), which we call compression-based PLS (cPLS), to learn predictive linear models with a high interpretability from massive high-dimensional data. We propose a novel grammar-compressed representation of data matrices that supports fast row and column access while the data matrix is in a compressed form. The original data matrix is grammarcompressed and then the linear model in PLS is learned on the compressed data matrix, which results in a significant reduction in working space, greatly improving scalability. We experimentally test cPLS on its ability to learn linear models for classification, regression and feature extraction with various massive high-dimensional data, and show that cPLS performs superiorly in terms of prediction accuracy, computational effciency, and interpretability.

元の言語英語
ホスト出版物のタイトルKDD 2016 - Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
出版者Association for Computing Machinery
ページ1875-1884
ページ数10
ISBN(電子版)9781450342322
DOI
出版物ステータス出版済み - 8 13 2016
イベント22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2016 - San Francisco, 米国
継続期間: 8 13 20168 17 2016

出版物シリーズ

名前Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
13-17-August-2016

その他

その他22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2016
米国
San Francisco
期間8/13/168/17/16

Fingerprint

Data mining
Scalability
Feature extraction
Costs
Industry
Statistical Models

All Science Journal Classification (ASJC) codes

  • Software
  • Information Systems

これを引用

Tabei, Y., Saigo, H., Yamanishi, Y., & Puglisi, S. J. (2016). Scalable partial least squares regression on grammar-compressed data matrices. : KDD 2016 - Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 1875-1884). (Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; 巻数 13-17-August-2016). Association for Computing Machinery. https://doi.org/10.1145/2939672.2939864

Scalable partial least squares regression on grammar-compressed data matrices. / Tabei, Yasuo; Saigo, Hiroto; Yamanishi, Yoshihiro; Puglisi, Simon J.

KDD 2016 - Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery, 2016. p. 1875-1884 (Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; 巻 13-17-August-2016).

研究成果: 著書/レポートタイプへの貢献会議での発言

Tabei, Y, Saigo, H, Yamanishi, Y & Puglisi, SJ 2016, Scalable partial least squares regression on grammar-compressed data matrices. : KDD 2016 - Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 巻. 13-17-August-2016, Association for Computing Machinery, pp. 1875-1884, 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2016, San Francisco, 米国, 8/13/16. https://doi.org/10.1145/2939672.2939864
Tabei Y, Saigo H, Yamanishi Y, Puglisi SJ. Scalable partial least squares regression on grammar-compressed data matrices. : KDD 2016 - Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery. 2016. p. 1875-1884. (Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining). https://doi.org/10.1145/2939672.2939864
Tabei, Yasuo ; Saigo, Hiroto ; Yamanishi, Yoshihiro ; Puglisi, Simon J. / Scalable partial least squares regression on grammar-compressed data matrices. KDD 2016 - Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery, 2016. pp. 1875-1884 (Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining).
@inproceedings{5106432201a043c9aa74ccff55f64d37,
title = "Scalable partial least squares regression on grammar-compressed data matrices",
abstract = "With massive high-dimensional data now commonplace in research and industry, there is a strong and growing demand for more scalable computational techniques for data analysis and knowledge discovery. Key to turning these data into knowledge is the ability to learn statistical models with high interpretability. Current methods for learning statistical models either produce models that are not interpretable or have prohibitive computational costs when applied to massive data. In this paper we address this need by presenting a scalable algorithm for partial least squares regression (PLS), which we call compression-based PLS (cPLS), to learn predictive linear models with a high interpretability from massive high-dimensional data. We propose a novel grammar-compressed representation of data matrices that supports fast row and column access while the data matrix is in a compressed form. The original data matrix is grammarcompressed and then the linear model in PLS is learned on the compressed data matrix, which results in a significant reduction in working space, greatly improving scalability. We experimentally test cPLS on its ability to learn linear models for classification, regression and feature extraction with various massive high-dimensional data, and show that cPLS performs superiorly in terms of prediction accuracy, computational effciency, and interpretability.",
author = "Yasuo Tabei and Hiroto Saigo and Yoshihiro Yamanishi and Puglisi, {Simon J.}",
year = "2016",
month = "8",
day = "13",
doi = "10.1145/2939672.2939864",
language = "English",
series = "Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining",
publisher = "Association for Computing Machinery",
pages = "1875--1884",
booktitle = "KDD 2016 - Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining",

}

TY - GEN

T1 - Scalable partial least squares regression on grammar-compressed data matrices

AU - Tabei, Yasuo

AU - Saigo, Hiroto

AU - Yamanishi, Yoshihiro

AU - Puglisi, Simon J.

PY - 2016/8/13

Y1 - 2016/8/13

N2 - With massive high-dimensional data now commonplace in research and industry, there is a strong and growing demand for more scalable computational techniques for data analysis and knowledge discovery. Key to turning these data into knowledge is the ability to learn statistical models with high interpretability. Current methods for learning statistical models either produce models that are not interpretable or have prohibitive computational costs when applied to massive data. In this paper we address this need by presenting a scalable algorithm for partial least squares regression (PLS), which we call compression-based PLS (cPLS), to learn predictive linear models with a high interpretability from massive high-dimensional data. We propose a novel grammar-compressed representation of data matrices that supports fast row and column access while the data matrix is in a compressed form. The original data matrix is grammarcompressed and then the linear model in PLS is learned on the compressed data matrix, which results in a significant reduction in working space, greatly improving scalability. We experimentally test cPLS on its ability to learn linear models for classification, regression and feature extraction with various massive high-dimensional data, and show that cPLS performs superiorly in terms of prediction accuracy, computational effciency, and interpretability.

AB - With massive high-dimensional data now commonplace in research and industry, there is a strong and growing demand for more scalable computational techniques for data analysis and knowledge discovery. Key to turning these data into knowledge is the ability to learn statistical models with high interpretability. Current methods for learning statistical models either produce models that are not interpretable or have prohibitive computational costs when applied to massive data. In this paper we address this need by presenting a scalable algorithm for partial least squares regression (PLS), which we call compression-based PLS (cPLS), to learn predictive linear models with a high interpretability from massive high-dimensional data. We propose a novel grammar-compressed representation of data matrices that supports fast row and column access while the data matrix is in a compressed form. The original data matrix is grammarcompressed and then the linear model in PLS is learned on the compressed data matrix, which results in a significant reduction in working space, greatly improving scalability. We experimentally test cPLS on its ability to learn linear models for classification, regression and feature extraction with various massive high-dimensional data, and show that cPLS performs superiorly in terms of prediction accuracy, computational effciency, and interpretability.

UR - http://www.scopus.com/inward/record.url?scp=84984985641&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84984985641&partnerID=8YFLogxK

U2 - 10.1145/2939672.2939864

DO - 10.1145/2939672.2939864

M3 - Conference contribution

T3 - Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

SP - 1875

EP - 1884

BT - KDD 2016 - Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

PB - Association for Computing Machinery

ER -