Collaborative agglomerative document clustering with limited information disclosure

Chunhua Su, Jianying Zhou, Feng Bao, Tsuyoshi Takagi, Kouichi Sakurai

Research output: Contribution to journalArticle

Abstract

Document clustering is a practical and powerful data mining technique to analyze large amount of documents and large sets of text or hypertext documents. However, it also brings the problem of sensitive information leaking in disregard of privacy, especially when it is executed in distributed environment. In this paper, we propose a cryptography-based framework to realize privacy-preserving document clustering among the users under the distributed environment; there are two parties, each having his private document database, want to collaboratively execute agglomerative document clustering without disclosing their private contents. We provide two implementations of such a framework, one is with more precision and stronger security but requires more computational resources. The other is a simplified version with less computational complexity and achieves higher processing speed. Additionally, we provide the security proofs and experimental analysis of precision and scalability of our proposal.

Original languageEnglish
Pages (from-to)964-978
Number of pages15
JournalSecurity and Communication Networks
Volume7
Issue number6
DOIs
Publication statusPublished - Jan 1 2014

Fingerprint

Cryptography
Data mining
Scalability
Computational complexity
Processing

All Science Journal Classification (ASJC) codes

  • Information Systems
  • Computer Networks and Communications

Cite this

Collaborative agglomerative document clustering with limited information disclosure. / Su, Chunhua; Zhou, Jianying; Bao, Feng; Takagi, Tsuyoshi; Sakurai, Kouichi.

In: Security and Communication Networks, Vol. 7, No. 6, 01.01.2014, p. 964-978.

Research output: Contribution to journalArticle

Su, Chunhua ; Zhou, Jianying ; Bao, Feng ; Takagi, Tsuyoshi ; Sakurai, Kouichi. / Collaborative agglomerative document clustering with limited information disclosure. In: Security and Communication Networks. 2014 ; Vol. 7, No. 6. pp. 964-978.
@article{c86a49b13b87408dbd218008e1f437c8,
title = "Collaborative agglomerative document clustering with limited information disclosure",
abstract = "Document clustering is a practical and powerful data mining technique to analyze large amount of documents and large sets of text or hypertext documents. However, it also brings the problem of sensitive information leaking in disregard of privacy, especially when it is executed in distributed environment. In this paper, we propose a cryptography-based framework to realize privacy-preserving document clustering among the users under the distributed environment; there are two parties, each having his private document database, want to collaboratively execute agglomerative document clustering without disclosing their private contents. We provide two implementations of such a framework, one is with more precision and stronger security but requires more computational resources. The other is a simplified version with less computational complexity and achieves higher processing speed. Additionally, we provide the security proofs and experimental analysis of precision and scalability of our proposal.",
author = "Chunhua Su and Jianying Zhou and Feng Bao and Tsuyoshi Takagi and Kouichi Sakurai",
year = "2014",
month = "1",
day = "1",
doi = "10.1002/sec.811",
language = "English",
volume = "7",
pages = "964--978",
journal = "Security and Communication Networks",
issn = "1939-0122",
publisher = "John Wiley and Sons Inc.",
number = "6",

}

TY - JOUR

T1 - Collaborative agglomerative document clustering with limited information disclosure

AU - Su, Chunhua

AU - Zhou, Jianying

AU - Bao, Feng

AU - Takagi, Tsuyoshi

AU - Sakurai, Kouichi

PY - 2014/1/1

Y1 - 2014/1/1

N2 - Document clustering is a practical and powerful data mining technique to analyze large amount of documents and large sets of text or hypertext documents. However, it also brings the problem of sensitive information leaking in disregard of privacy, especially when it is executed in distributed environment. In this paper, we propose a cryptography-based framework to realize privacy-preserving document clustering among the users under the distributed environment; there are two parties, each having his private document database, want to collaboratively execute agglomerative document clustering without disclosing their private contents. We provide two implementations of such a framework, one is with more precision and stronger security but requires more computational resources. The other is a simplified version with less computational complexity and achieves higher processing speed. Additionally, we provide the security proofs and experimental analysis of precision and scalability of our proposal.

AB - Document clustering is a practical and powerful data mining technique to analyze large amount of documents and large sets of text or hypertext documents. However, it also brings the problem of sensitive information leaking in disregard of privacy, especially when it is executed in distributed environment. In this paper, we propose a cryptography-based framework to realize privacy-preserving document clustering among the users under the distributed environment; there are two parties, each having his private document database, want to collaboratively execute agglomerative document clustering without disclosing their private contents. We provide two implementations of such a framework, one is with more precision and stronger security but requires more computational resources. The other is a simplified version with less computational complexity and achieves higher processing speed. Additionally, we provide the security proofs and experimental analysis of precision and scalability of our proposal.

UR - http://www.scopus.com/inward/record.url?scp=84899100799&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84899100799&partnerID=8YFLogxK

U2 - 10.1002/sec.811

DO - 10.1002/sec.811

M3 - Article

AN - SCOPUS:84899100799

VL - 7

SP - 964

EP - 978

JO - Security and Communication Networks

JF - Security and Communication Networks

SN - 1939-0122

IS - 6

ER -