TY - JOUR
T1 - Hierarchical clustered multiclass discriminant analysis via cross-validation
AU - Hirose, Kei
AU - Miura, Kanta
AU - Koie, Atori
N1 - Funding Information:
This work was supported by JSPS KAKENHI Grant Number JP19K11862 and JST-Mirai Program Grant Number JPMJMI18A2 , Japan. We thank an Associate Editor and two reviewers for helpful comments and insightful suggestions that improved the quality of the paper considerably.
Publisher Copyright:
© 2022 The Authors
PY - 2023/2
Y1 - 2023/2
N2 - Linear discriminant analysis (LDA) is a well-known method for multiclass classification and dimensionality reduction. However, in general, ordinary LDA does not achieve high prediction accuracy when observations in some classes are difficult to be classified. A novel cluster-based LDA method is proposed that significantly improves prediction accuracy. Hierarchical clustering is adopted, and the dissimilarity measure of two clusters is defined by the cross-validation (CV) value. Therefore, clusters are constructed such that the misclassification error rate is minimized. The proposed approach involves a heavy computational load because the CV value must be computed at each step of the hierarchical clustering algorithm. To address this issue, a regression formulation for LDA is developed and an efficient algorithm that computes an approximate CV value is constructed. The performance of the proposed method is investigated by applying it to both artificial and real datasets. The proposed method provides high prediction accuracy with fast computation from both numerical and theoretical viewpoints.
AB - Linear discriminant analysis (LDA) is a well-known method for multiclass classification and dimensionality reduction. However, in general, ordinary LDA does not achieve high prediction accuracy when observations in some classes are difficult to be classified. A novel cluster-based LDA method is proposed that significantly improves prediction accuracy. Hierarchical clustering is adopted, and the dissimilarity measure of two clusters is defined by the cross-validation (CV) value. Therefore, clusters are constructed such that the misclassification error rate is minimized. The proposed approach involves a heavy computational load because the CV value must be computed at each step of the hierarchical clustering algorithm. To address this issue, a regression formulation for LDA is developed and an efficient algorithm that computes an approximate CV value is constructed. The performance of the proposed method is investigated by applying it to both artificial and real datasets. The proposed method provides high prediction accuracy with fast computation from both numerical and theoretical viewpoints.
UR - http://www.scopus.com/inward/record.url?scp=85139180049&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85139180049&partnerID=8YFLogxK
U2 - 10.1016/j.csda.2022.107613
DO - 10.1016/j.csda.2022.107613
M3 - Article
AN - SCOPUS:85139180049
SN - 0167-9473
VL - 178
JO - Computational Statistics and Data Analysis
JF - Computational Statistics and Data Analysis
M1 - 107613
ER -