Using hashing and lexicographic order for Frequent Itemsets Mining on data streams

Bustio L, Letras M, Cumplido R, Hernández-León R, Feregrino C, Bande JM (2019)


Publication Type: Journal article, Original article

Publication year: 2019

Journal

Book Volume: 125

Pages Range: 58-71

DOI: 10.1016/j.jpdc.2018.11.002.

Abstract

Frequent Itemsets Mining is a Data Mining technique that has been employed to extract useful knowledge from datasets and, more recently, also from data streams. Data streams are unbounded and infinite flows of data arriving at high rates which cannot be stored for off-line processing; therefore, proposed algorithms for Frequent Itemsets Mining approaches from datasets cannot be used straightforwardly for Frequent Itemsets Mining from data streams. Frequent Itemsets Mining is a compute intensive task, hence developing custom hardware-based architectures to speed up this process is an active research topic. This paper introduces an algorithm for a hardware-based Frequent Itemsets Mining on data streams that uses the top- frequent 1-itemsets detection as preprocessing. The received transactions are handled using hash functions, and the lexicographic order of items is used for obtaining frequent itemsets. The proposed algorithm is focused on discovering frequent itemsets in data streams composed of short transactions in large alphabets. Experimental results demonstrate that the proposed algorithm outperforms the processing time of the state-of-the-art algorithms used as the baseline.

How to cite

APA:

Bustio, L., Letras, M., Cumplido, R., Hernández-León, R., Feregrino, C., & Bande, J.M. (2019). Using hashing and lexicographic order for Frequent Itemsets Mining on data streams. Journal of Parallel and Distributed Computing, 125, 58-71. https://doi.org/10.1016/j.jpdc.2018.11.002.

MLA:

Bustio, Lazaro, et al. "Using hashing and lexicographic order for Frequent Itemsets Mining on data streams." Journal of Parallel and Distributed Computing 125 (2019): 58-71.

BibTeX: Download