ProUM: Projection-based Utility Mining on Sequence Data

Gan, Wensheng; Lin, Jerry Chun-Wei; Zhang, Jiexiong; Chao, Han-Chieh; Fujita, Hamido; Yu, Philip S.

doi:10.1016/j.ins.2019.10.033

Computer Science > Databases

arXiv:1904.07764 (cs)

[Submitted on 16 Apr 2019 (v1), last revised 12 Sep 2019 (this version, v2)]

Title:ProUM: Projection-based Utility Mining on Sequence Data

Authors:Wensheng Gan, Jerry Chun-Wei Lin, Jiexiong Zhang, Han-Chieh Chao, Hamido Fujita, Philip S. Yu

View PDF

Abstract:Utility is an important concept in economics. A variety of applications consider utility in real-life situations, which has lead to the emergence of utility-oriented mining (also called utility mining) in the recent decade. Utility mining has attracted a great amount of attention, but most of the existing studies have been developed to deal with itemset-based data. Time-ordered sequence data is more commonly seen in real-world situations, which is different from itemset-based data. Since they are time-consuming and require large amount of memory usage, current utility mining algorithms still have limitations when dealing with sequence data. In addition, the mining efficiency of utility mining on sequence data still needs to be improved, especially for long sequences or when there is a low minimum utility threshold. In this paper, we propose an efficient Projection-based Utility Mining (ProUM) approach to discover high-utility sequential patterns from sequence data. The utility-array structure is designed to store the necessary information of the sequence-order and utility. ProUM can significantly improve the mining efficiency by utilizing the projection technique in generating utility-array, and it effectively reduces the memory consumption. Furthermore, a new upper bound named sequence extension utility is proposed and several pruning strategies are further applied to improve the efficiency of ProUM. By taking utility theory into account, the derived high-utility sequential patterns have more insightful and interesting information than other kinds of patterns. Experimental results showed that the proposed ProUM algorithm significantly outperformed the state-of-the-art algorithms in terms of execution time, memory usage, and scalability.

Comments:	Elsevier Information Science, 17 pages, 4 figures
Subjects:	Databases (cs.DB); Data Structures and Algorithms (cs.DS)
Cite as:	arXiv:1904.07764 [cs.DB]
	(or arXiv:1904.07764v2 [cs.DB] for this version)
	https://doi.org/10.48550/arXiv.1904.07764
Journal reference:	Information Science, 2020
Related DOI:	https://doi.org/10.1016/j.ins.2019.10.033

Submission history

From: Wensheng Gan [view email]
[v1] Tue, 16 Apr 2019 15:34:40 UTC (1,850 KB)
[v2] Thu, 12 Sep 2019 09:18:50 UTC (512 KB)

Computer Science > Databases

Title:ProUM: Projection-based Utility Mining on Sequence Data

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Databases

Title:ProUM: Projection-based Utility Mining on Sequence Data

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators