TY - JOUR
T1 - Discriminative feature of cells characterizes cell populations of interest by a small subset of genes
AU - Fujii, Takeru
AU - Maehara, Kazumitsu
AU - Fujita, Masatoshi
AU - Ohkawa, Yasuyuki
N1 - Funding Information:
This work was supported by Core research for evolutional science and technology (JPMJCR16G1 to Y.O. https://www.jst.go.jp/ kisoken/crest/en/index.html), Precursory Research for Embryonic Science and Technology (JPMJPR2026 to K.M. https://www.jst.go.jp/ kisoken/presto/en/index.html) and Japan society for the promotion of science (JP18H04802, JP18H05527, JP19H05244, JP17H03608, JP20H00456, JP20H04846, and JP21H00232 to Y. O.; JP19H04970, JP19H03158, and JP20H05393 to K.M. https://www.jsps.go.jp/english/). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Publisher Copyright:
© 2021 Fujii et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
PY - 2021/11
Y1 - 2021/11
N2 - Organisms are composed of various cell types with specific states. To obtain a comprehensive understanding of the functions of organs and tissues, cell types have been classified and defined by identifying specific marker genes. Statistical tests are critical for identifying marker genes, which often involve evaluating differences in the mean expression levels of genes. Differentially expressed gene (DEG)-based analysis has been the most frequently used method of this kind. However, in association with increases in sample size such as in single-cell analysis, DEG-based analysis has faced difficulties associated with the inflation of P-values. Here, we propose the concept of discriminative feature of cells (DFC), an alternative to using DEG-based approaches. We implemented DFC using logistic regression with an adaptive LASSO penalty to perform binary classification for discriminating a population of interest and variable selection to obtain a small subset of defining genes. We demonstrated that DFC prioritized gene pairs with non-independent expression using artificial data and that DFC enabled characterization of the muscle satellite/progenitor cell population. The results revealed that DFC well captured cell-type-specific markers, specific gene expression patterns, and subcategories of this cell population. DFC may complement DEGbased methods for interpreting large data sets. DEG-based analysis uses lists of genes with differences in expression between groups, while DFC, which can be termed a discriminative approach, has potential applications in the task of cell characterization. Upon recent advances in the high-throughput analysis of single cells, methods of cell characterization such as scRNA-seq can be effectively subjected to the discriminative methods.
AB - Organisms are composed of various cell types with specific states. To obtain a comprehensive understanding of the functions of organs and tissues, cell types have been classified and defined by identifying specific marker genes. Statistical tests are critical for identifying marker genes, which often involve evaluating differences in the mean expression levels of genes. Differentially expressed gene (DEG)-based analysis has been the most frequently used method of this kind. However, in association with increases in sample size such as in single-cell analysis, DEG-based analysis has faced difficulties associated with the inflation of P-values. Here, we propose the concept of discriminative feature of cells (DFC), an alternative to using DEG-based approaches. We implemented DFC using logistic regression with an adaptive LASSO penalty to perform binary classification for discriminating a population of interest and variable selection to obtain a small subset of defining genes. We demonstrated that DFC prioritized gene pairs with non-independent expression using artificial data and that DFC enabled characterization of the muscle satellite/progenitor cell population. The results revealed that DFC well captured cell-type-specific markers, specific gene expression patterns, and subcategories of this cell population. DFC may complement DEGbased methods for interpreting large data sets. DEG-based analysis uses lists of genes with differences in expression between groups, while DFC, which can be termed a discriminative approach, has potential applications in the task of cell characterization. Upon recent advances in the high-throughput analysis of single cells, methods of cell characterization such as scRNA-seq can be effectively subjected to the discriminative methods.
UR - http://www.scopus.com/inward/record.url?scp=85119720303&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85119720303&partnerID=8YFLogxK
U2 - 10.1371/journal.pcbi.1009579
DO - 10.1371/journal.pcbi.1009579
M3 - Article
C2 - 34797848
AN - SCOPUS:85119720303
VL - 17
JO - PLoS Computational Biology
JF - PLoS Computational Biology
SN - 1553-734X
IS - 11
M1 - e1009579
ER -