Modified Fuzzy Gap statistic for estimating preferable number of clusters in Fuzzy k-means clustering

Chinatsu Arima, Kazumi Hakamada, Masahiro Okamoto, Taizo Hanai

研究成果: ジャーナルへの寄稿記事

16 引用 (Scopus)

抄録

In clustering methods, the estimation of the optimal number of clusters is significant for subsequent analysis. Without detailed biological information on the genes involved, the evaluation of the number of clusters becomes difficult, and we have to rely on an internal measure that is based on the distribution of the data of the clustering result. The Gap statistic has been proposed as a superior method for estimating the number of clusters in crisp clustering. In this study, we proposed a modified Fuzzy Gap statistic (MFGS) and applied it to fuzzy k-means clustering. For estimating the number of clusters, fuzzy k-means clustering with the MFGS was applied to two artificial data sets with noise and to two experimentally observed gene expression data sets. For the artificial data sets, compared with other internal measures, the MFGS showed a higher performance in terms of robustness against noise for estimating the optimal number of clusters. Moreover, it could be used to estimate the optimal number of clusters in experimental data sets. It was confirmed that the proposed MFGS is a useful method for estimating the number of clusters for microarray data sets.

元の言語英語
ページ(範囲)273-281
ページ数9
ジャーナルJournal of Bioscience and Bioengineering
105
発行部数3
DOI
出版物ステータス出版済み - 3 1 2008

Fingerprint

Cluster Analysis
Statistics
Noise
Microarrays
Gene expression
Genes
Datasets
Gene Expression

All Science Journal Classification (ASJC) codes

  • Biotechnology
  • Bioengineering
  • Medicine(all)
  • Applied Microbiology and Biotechnology

これを引用

Modified Fuzzy Gap statistic for estimating preferable number of clusters in Fuzzy k-means clustering. / Arima, Chinatsu; Hakamada, Kazumi; Okamoto, Masahiro; Hanai, Taizo.

:: Journal of Bioscience and Bioengineering, 巻 105, 番号 3, 01.03.2008, p. 273-281.

研究成果: ジャーナルへの寄稿記事

Arima, Chinatsu ; Hakamada, Kazumi ; Okamoto, Masahiro ; Hanai, Taizo. / Modified Fuzzy Gap statistic for estimating preferable number of clusters in Fuzzy k-means clustering. :: Journal of Bioscience and Bioengineering. 2008 ; 巻 105, 番号 3. pp. 273-281.
@article{1949780111b94e7c905b6b6347916d78,
title = "Modified Fuzzy Gap statistic for estimating preferable number of clusters in Fuzzy k-means clustering",
abstract = "In clustering methods, the estimation of the optimal number of clusters is significant for subsequent analysis. Without detailed biological information on the genes involved, the evaluation of the number of clusters becomes difficult, and we have to rely on an internal measure that is based on the distribution of the data of the clustering result. The Gap statistic has been proposed as a superior method for estimating the number of clusters in crisp clustering. In this study, we proposed a modified Fuzzy Gap statistic (MFGS) and applied it to fuzzy k-means clustering. For estimating the number of clusters, fuzzy k-means clustering with the MFGS was applied to two artificial data sets with noise and to two experimentally observed gene expression data sets. For the artificial data sets, compared with other internal measures, the MFGS showed a higher performance in terms of robustness against noise for estimating the optimal number of clusters. Moreover, it could be used to estimate the optimal number of clusters in experimental data sets. It was confirmed that the proposed MFGS is a useful method for estimating the number of clusters for microarray data sets.",
author = "Chinatsu Arima and Kazumi Hakamada and Masahiro Okamoto and Taizo Hanai",
year = "2008",
month = "3",
day = "1",
doi = "10.1263/jbb.105.273",
language = "English",
volume = "105",
pages = "273--281",
journal = "Journal of Bioscience and Bioengineering",
issn = "1389-1723",
publisher = "The Society for Biotechnology, Japan",
number = "3",

}

TY - JOUR

T1 - Modified Fuzzy Gap statistic for estimating preferable number of clusters in Fuzzy k-means clustering

AU - Arima, Chinatsu

AU - Hakamada, Kazumi

AU - Okamoto, Masahiro

AU - Hanai, Taizo

PY - 2008/3/1

Y1 - 2008/3/1

N2 - In clustering methods, the estimation of the optimal number of clusters is significant for subsequent analysis. Without detailed biological information on the genes involved, the evaluation of the number of clusters becomes difficult, and we have to rely on an internal measure that is based on the distribution of the data of the clustering result. The Gap statistic has been proposed as a superior method for estimating the number of clusters in crisp clustering. In this study, we proposed a modified Fuzzy Gap statistic (MFGS) and applied it to fuzzy k-means clustering. For estimating the number of clusters, fuzzy k-means clustering with the MFGS was applied to two artificial data sets with noise and to two experimentally observed gene expression data sets. For the artificial data sets, compared with other internal measures, the MFGS showed a higher performance in terms of robustness against noise for estimating the optimal number of clusters. Moreover, it could be used to estimate the optimal number of clusters in experimental data sets. It was confirmed that the proposed MFGS is a useful method for estimating the number of clusters for microarray data sets.

AB - In clustering methods, the estimation of the optimal number of clusters is significant for subsequent analysis. Without detailed biological information on the genes involved, the evaluation of the number of clusters becomes difficult, and we have to rely on an internal measure that is based on the distribution of the data of the clustering result. The Gap statistic has been proposed as a superior method for estimating the number of clusters in crisp clustering. In this study, we proposed a modified Fuzzy Gap statistic (MFGS) and applied it to fuzzy k-means clustering. For estimating the number of clusters, fuzzy k-means clustering with the MFGS was applied to two artificial data sets with noise and to two experimentally observed gene expression data sets. For the artificial data sets, compared with other internal measures, the MFGS showed a higher performance in terms of robustness against noise for estimating the optimal number of clusters. Moreover, it could be used to estimate the optimal number of clusters in experimental data sets. It was confirmed that the proposed MFGS is a useful method for estimating the number of clusters for microarray data sets.

UR - http://www.scopus.com/inward/record.url?scp=41549165103&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=41549165103&partnerID=8YFLogxK

U2 - 10.1263/jbb.105.273

DO - 10.1263/jbb.105.273

M3 - Article

VL - 105

SP - 273

EP - 281

JO - Journal of Bioscience and Bioengineering

JF - Journal of Bioscience and Bioengineering

SN - 1389-1723

IS - 3

ER -