Itemset mining is a well-known exploratory technique used to discover interesting correlations hidden in a data collection. Since ever increasing amounts of data are being collected and stored (e.g., business transactions, medical and biological data, context-aware applications), scalable and efficient approaches are needed to analyzing these large data collections. This paper proposes a parallel disk-based approach to efficiently supporting frequent itemset mining on a multi-core processor. Our parallel strategy is presented in the context of the VLDB-Mine persistent data structure. Different techniques have been proposed to optimize both data- and compute-intensive aspects of the mining algorithm. Preliminary experiments, performed on both real and synthetic datasets, show promising results in improving the efficiency and scalability of the mining activity on large datasets.

P-Mine: Parallel itemset mining on large datasets / Baralis, ELENA MARIA; Cerquitelli, Tania; Chiusano, SILVIA ANNA; Grand, Alberto. - STAMPA. - 1:(2013), pp. 266-271. (Intervento presentato al convegno 2013 IEEE 29th International Conference on Data Engineering Workshops (ICDEW) tenutosi a Brisbane, Queensland (Australia) nel April 8-12, 2013) [10.1109/ICDEW.2013.6547461].

P-Mine: Parallel itemset mining on large datasets

BARALIS, ELENA MARIA;CERQUITELLI, TANIA;CHIUSANO, SILVIA ANNA;GRAND, ALBERTO
2013

Abstract

Itemset mining is a well-known exploratory technique used to discover interesting correlations hidden in a data collection. Since ever increasing amounts of data are being collected and stored (e.g., business transactions, medical and biological data, context-aware applications), scalable and efficient approaches are needed to analyzing these large data collections. This paper proposes a parallel disk-based approach to efficiently supporting frequent itemset mining on a multi-core processor. Our parallel strategy is presented in the context of the VLDB-Mine persistent data structure. Different techniques have been proposed to optimize both data- and compute-intensive aspects of the mining algorithm. Preliminary experiments, performed on both real and synthetic datasets, show promising results in improving the efficiency and scalability of the mining activity on large datasets.
2013
978-1-4673-5304-5
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/2518607
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo