Frequent Itemsets Mining for Big Data: A Comparative Analysis

Apiletti, Daniele; Baralis, ELENA MARIA; Cerquitelli, Tania; Garza, Paolo; Pulvirenti, Fabio; Venturini, Luca

doi:10.1016/j.bdr.2017.06.006

Itemset mining is a well-known exploratory data mining technique used to discover interesting correlations hidden in a data collection. Since it supports different targeted analyses, it is profitably exploited in a wide range of different domains, ranging from network traffic data to medical records. With the increasing amount of generated data, different scalable algorithms have been developed, exploiting the advantages of distributed computing frameworks, such as Apache Hadoop and Spark. This paper reviews Hadoop- and Spark-based scalable algorithms addressing the frequent itemset mining problem in the Big Data domain through both theoretical and experimental comparative analyses. Since the itemset mining task is computationally expensive, its distribution and parallelization strategies heavily affect memory usage, load balancing, and communication costs. A detailed discussion of the algorithmic choices of the distributed methods for frequent itemset mining is followed by an experimental analysis comparing the performance of state-of-the-art distributed implementations on both synthetic and real datasets. The strengths and weaknesses of the algorithms are thoroughly discussed with respect to the dataset features (e.g., data distribution, average transaction length, number of records), and specific parameter settings. Finally, based on theoretical and experimental analyses, open research directions for the parallelization of the itemset mining problem are presented.

Frequent Itemsets Mining for Big Data: A Comparative Analysis / Apiletti, Daniele; Baralis, ELENA MARIA; Cerquitelli, Tania; Garza, Paolo; Pulvirenti, Fabio; Venturini, Luca. - In: BIG DATA RESEARCH. - ISSN 2214-5796. - STAMPA. - 9:C(2017), pp. 67-83. [10.1016/j.bdr.2017.06.006]

Frequent Itemsets Mining for Big Data: A Comparative Analysis

APILETTI, DANIELE;BARALIS, ELENA MARIA;CERQUITELLI, TANIA;GARZA, PAOLO;PULVIRENTI, FABIO;VENTURINI, LUCA

2017

Abstract

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno del prodotto
	
			2017
		
	Codice DOI
	
			https://dx.doi.org/10.1016/j.bdr.2017.06.006
		
	Titolo della Rivista
	
			BIG DATA RESEARCH
		
	Appare nelle tipologie
	
			1.1 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
survey_itemset (1).pdf Open Access dal 25/08/2019 Descrizione: Articolo Tipologia: 2. Post-print / Author's Accepted Manuscript Licenza: Creative commons Dimensione 4.88 MB Formato Adobe PDF Visualizza/Apri	4.88 MB	Adobe PDF	Visualizza/Apri
1-s2.0-S2214579616300193-main.pdf non disponibili Tipologia: 2a Post-print versione editoriale / Version of Record Licenza: Non Pubblico - Accesso privato/ristretto Dimensione 4.58 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	4.58 MB	Adobe PDF	Visualizza/Apri Richiedi una copia

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/2680344

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

PORTO @ Archivio Istituzionale della Ricerca

Frequent Itemsets Mining for Big Data: A Comparative Analysis

APILETTI, DANIELE;BARALIS, ELENA MARIA;CERQUITELLI, TANIA;GARZA, PAOLO;PULVIRENTI, FABIO;VENTURINI, LUCA

2017

Abstract

Scheda breve Scheda completa Scheda completa (DC)

Pubblicazioni consigliate

Informazioni

Attenzione

Conferma cancellazione

Scheda breve

Scheda completa

Scheda completa (DC)