Constructing projection frequent pattern tree for efficient mining

Jian Wen Xiang, Yan Xiang He, Futatsugi Kokichi, Weiqiang Kong

Research output: Contribution to journalArticle

Abstract

Frequent Pattern mining plays an essential role in data mining. Most of the previous studies adopt an Apriori-like candidate set generation-and-test approach. However, candidate set generation is still costly, especially when there exist prolific patterns and/or long patterns. We introduce a novel frequent pattern growth (FP-growth) method, which is efficient and scalable for mining both long and short frequent patterns without candidate generation. And build a new projection frequent pattern tree (PFP-tree) algorithm, which not only heirs all the advantages in the FP-growth method, but also avoids it's bottleneck in database size dependence when constructing the frequent pattern tree (FP-tree). Efficiency of mining is achieved by introducing the projection technique, which avoid serial scan each frequent item in the database, the cost is mainly related to the depth of the tree, namely the number of frequent items of the longest transaction in the database, not the sum of all the frequent items in the database, which hugely shortens the time of tree-construction. Our performance study shows that the PFP-tree method is efficient and scalable for mining large databases or data warehouses, and is even about an order of magnitude faster than the FP-growth method.

Original languageEnglish
Pages (from-to)351-357
Number of pages7
JournalWuhan University Journal of Natural Sciences
Volume8
Issue number2 A
Publication statusPublished - Jun 1 2003

Fingerprint

Data warehouses
Data mining
Costs

All Science Journal Classification (ASJC) codes

  • General

Cite this

Xiang, J. W., He, Y. X., Kokichi, F., & Kong, W. (2003). Constructing projection frequent pattern tree for efficient mining. Wuhan University Journal of Natural Sciences, 8(2 A), 351-357.

Constructing projection frequent pattern tree for efficient mining. / Xiang, Jian Wen; He, Yan Xiang; Kokichi, Futatsugi; Kong, Weiqiang.

In: Wuhan University Journal of Natural Sciences, Vol. 8, No. 2 A, 01.06.2003, p. 351-357.

Research output: Contribution to journalArticle

Xiang, JW, He, YX, Kokichi, F & Kong, W 2003, 'Constructing projection frequent pattern tree for efficient mining', Wuhan University Journal of Natural Sciences, vol. 8, no. 2 A, pp. 351-357.
Xiang, Jian Wen ; He, Yan Xiang ; Kokichi, Futatsugi ; Kong, Weiqiang. / Constructing projection frequent pattern tree for efficient mining. In: Wuhan University Journal of Natural Sciences. 2003 ; Vol. 8, No. 2 A. pp. 351-357.
@article{75fb273f042041e88c71adeaa870de6b,
title = "Constructing projection frequent pattern tree for efficient mining",
abstract = "Frequent Pattern mining plays an essential role in data mining. Most of the previous studies adopt an Apriori-like candidate set generation-and-test approach. However, candidate set generation is still costly, especially when there exist prolific patterns and/or long patterns. We introduce a novel frequent pattern growth (FP-growth) method, which is efficient and scalable for mining both long and short frequent patterns without candidate generation. And build a new projection frequent pattern tree (PFP-tree) algorithm, which not only heirs all the advantages in the FP-growth method, but also avoids it's bottleneck in database size dependence when constructing the frequent pattern tree (FP-tree). Efficiency of mining is achieved by introducing the projection technique, which avoid serial scan each frequent item in the database, the cost is mainly related to the depth of the tree, namely the number of frequent items of the longest transaction in the database, not the sum of all the frequent items in the database, which hugely shortens the time of tree-construction. Our performance study shows that the PFP-tree method is efficient and scalable for mining large databases or data warehouses, and is even about an order of magnitude faster than the FP-growth method.",
author = "Xiang, {Jian Wen} and He, {Yan Xiang} and Futatsugi Kokichi and Weiqiang Kong",
year = "2003",
month = "6",
day = "1",
language = "English",
volume = "8",
pages = "351--357",
journal = "Wuhan University Journal of Natural Sciences",
issn = "1007-1202",
publisher = "Wuhan University",
number = "2 A",

}

TY - JOUR

T1 - Constructing projection frequent pattern tree for efficient mining

AU - Xiang, Jian Wen

AU - He, Yan Xiang

AU - Kokichi, Futatsugi

AU - Kong, Weiqiang

PY - 2003/6/1

Y1 - 2003/6/1

N2 - Frequent Pattern mining plays an essential role in data mining. Most of the previous studies adopt an Apriori-like candidate set generation-and-test approach. However, candidate set generation is still costly, especially when there exist prolific patterns and/or long patterns. We introduce a novel frequent pattern growth (FP-growth) method, which is efficient and scalable for mining both long and short frequent patterns without candidate generation. And build a new projection frequent pattern tree (PFP-tree) algorithm, which not only heirs all the advantages in the FP-growth method, but also avoids it's bottleneck in database size dependence when constructing the frequent pattern tree (FP-tree). Efficiency of mining is achieved by introducing the projection technique, which avoid serial scan each frequent item in the database, the cost is mainly related to the depth of the tree, namely the number of frequent items of the longest transaction in the database, not the sum of all the frequent items in the database, which hugely shortens the time of tree-construction. Our performance study shows that the PFP-tree method is efficient and scalable for mining large databases or data warehouses, and is even about an order of magnitude faster than the FP-growth method.

AB - Frequent Pattern mining plays an essential role in data mining. Most of the previous studies adopt an Apriori-like candidate set generation-and-test approach. However, candidate set generation is still costly, especially when there exist prolific patterns and/or long patterns. We introduce a novel frequent pattern growth (FP-growth) method, which is efficient and scalable for mining both long and short frequent patterns without candidate generation. And build a new projection frequent pattern tree (PFP-tree) algorithm, which not only heirs all the advantages in the FP-growth method, but also avoids it's bottleneck in database size dependence when constructing the frequent pattern tree (FP-tree). Efficiency of mining is achieved by introducing the projection technique, which avoid serial scan each frequent item in the database, the cost is mainly related to the depth of the tree, namely the number of frequent items of the longest transaction in the database, not the sum of all the frequent items in the database, which hugely shortens the time of tree-construction. Our performance study shows that the PFP-tree method is efficient and scalable for mining large databases or data warehouses, and is even about an order of magnitude faster than the FP-growth method.

UR - http://www.scopus.com/inward/record.url?scp=0141456385&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0141456385&partnerID=8YFLogxK

M3 - Article

AN - SCOPUS:0141456385

VL - 8

SP - 351

EP - 357

JO - Wuhan University Journal of Natural Sciences

JF - Wuhan University Journal of Natural Sciences

SN - 1007-1202

IS - 2 A

ER -