Principal points analysis via p-median problem for binary data

Haruka Yamashita, Yoshinobu Kawahara

Research output: Contribution to journalArticle

Abstract

Analysis with principal points is a useful statistical tool for summarizing large data. In this paper, we propose a subgradient-based algorithm to calculate a set of principal points for multivariate binary data by the formulating it as a p-median problem. This enables us to find a globally optimal set of principal points or an ε-optimal solution in the middle of the calculation by combining an upper bound found using the greedy method. This algorithm is an iterative procedure where each iteration can be calculated in an efficient manner. We investigate the applicability of the proposed framework with questionnaire data and arXiv co-authors data.

Original languageEnglish
JournalJournal of Applied Statistics
DOIs
Publication statusAccepted/In press - Jan 1 2019
Externally publishedYes

Fingerprint

Principal Points
P-median Problem
Binary Data
Subgradient
Large Data
Multivariate Data
Iterative Procedure
Questionnaire
Optimal Solution
Upper bound
Iteration
Calculate
P-median

All Science Journal Classification (ASJC) codes

  • Statistics and Probability
  • Statistics, Probability and Uncertainty

Cite this

Principal points analysis via p-median problem for binary data. / Yamashita, Haruka; Kawahara, Yoshinobu.

In: Journal of Applied Statistics, 01.01.2019.

Research output: Contribution to journalArticle

@article{6e1f93b2427f4dce8eaf96a3cb7547d2,
title = "Principal points analysis via p-median problem for binary data",
abstract = "Analysis with principal points is a useful statistical tool for summarizing large data. In this paper, we propose a subgradient-based algorithm to calculate a set of principal points for multivariate binary data by the formulating it as a p-median problem. This enables us to find a globally optimal set of principal points or an ε-optimal solution in the middle of the calculation by combining an upper bound found using the greedy method. This algorithm is an iterative procedure where each iteration can be calculated in an efficient manner. We investigate the applicability of the proposed framework with questionnaire data and arXiv co-authors data.",
author = "Haruka Yamashita and Yoshinobu Kawahara",
year = "2019",
month = "1",
day = "1",
doi = "10.1080/02664763.2019.1675605",
language = "English",
journal = "Journal of Applied Statistics",
issn = "0266-4763",
publisher = "Routledge",

}

TY - JOUR

T1 - Principal points analysis via p-median problem for binary data

AU - Yamashita, Haruka

AU - Kawahara, Yoshinobu

PY - 2019/1/1

Y1 - 2019/1/1

N2 - Analysis with principal points is a useful statistical tool for summarizing large data. In this paper, we propose a subgradient-based algorithm to calculate a set of principal points for multivariate binary data by the formulating it as a p-median problem. This enables us to find a globally optimal set of principal points or an ε-optimal solution in the middle of the calculation by combining an upper bound found using the greedy method. This algorithm is an iterative procedure where each iteration can be calculated in an efficient manner. We investigate the applicability of the proposed framework with questionnaire data and arXiv co-authors data.

AB - Analysis with principal points is a useful statistical tool for summarizing large data. In this paper, we propose a subgradient-based algorithm to calculate a set of principal points for multivariate binary data by the formulating it as a p-median problem. This enables us to find a globally optimal set of principal points or an ε-optimal solution in the middle of the calculation by combining an upper bound found using the greedy method. This algorithm is an iterative procedure where each iteration can be calculated in an efficient manner. We investigate the applicability of the proposed framework with questionnaire data and arXiv co-authors data.

UR - http://www.scopus.com/inward/record.url?scp=85074012296&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85074012296&partnerID=8YFLogxK

U2 - 10.1080/02664763.2019.1675605

DO - 10.1080/02664763.2019.1675605

M3 - Article

AN - SCOPUS:85074012296

JO - Journal of Applied Statistics

JF - Journal of Applied Statistics

SN - 0266-4763

ER -