In the era of smart cities huge data volumes are continuously generated and collected, thus prompting the need for efficient and distributed data mining approaches. Generalized itemset mining is an established data mining technique, which entails the discovery of multiple-level patterns hidden in the analyzed data by exploiting analyst-provided taxonomies. Among the generalized itemsets, the most peculiar high-level patterns are those with many contrasting correlations among items at different abstraction levels. They represent misleading situations that are worth analyzing separately by experts during manual inspection. This paper proposes a novel cloud-based service, named MGI-CLOUD, to efficiently mine misleading multiple-level patterns, i.e., the Misleading Generalized Itemsets, on a distributed computing environment. MGI-CLOUD consists of a set of distributed MapReduce jobs running in the cloud. As a case study, the system has been contextualized in a real-life scenario, i.e., the analysis of traffic law infractions committed in a smart city environment. The experiments, performed on real datasets, demonstrate the efficiency and effectiveness of MGI-CLOUD.

Misleading generalized itemset mining in the cloud / Baralis, ELENA MARIA; Cagliero, Luca; Cerquitelli, Tania; Chiusano, SILVIA ANNA; Garza, Paolo; Grimaudo, Luigi; Pulvirenti, Fabio. - (2014), pp. 211-216. (Intervento presentato al convegno 12th IEEE International Symposium on Parallel and Distributed Processing with Applications (ISPA'14) tenutosi a Milano nel 26-28 Agosto 2014) [10.1109/ISPA.2014.36].

Misleading generalized itemset mining in the cloud

BARALIS, ELENA MARIA;CAGLIERO, LUCA;CERQUITELLI, TANIA;CHIUSANO, SILVIA ANNA;GARZA, PAOLO;GRIMAUDO, LUIGI;PULVIRENTI, FABIO
2014

Abstract

In the era of smart cities huge data volumes are continuously generated and collected, thus prompting the need for efficient and distributed data mining approaches. Generalized itemset mining is an established data mining technique, which entails the discovery of multiple-level patterns hidden in the analyzed data by exploiting analyst-provided taxonomies. Among the generalized itemsets, the most peculiar high-level patterns are those with many contrasting correlations among items at different abstraction levels. They represent misleading situations that are worth analyzing separately by experts during manual inspection. This paper proposes a novel cloud-based service, named MGI-CLOUD, to efficiently mine misleading multiple-level patterns, i.e., the Misleading Generalized Itemsets, on a distributed computing environment. MGI-CLOUD consists of a set of distributed MapReduce jobs running in the cloud. As a case study, the system has been contextualized in a real-life scenario, i.e., the analysis of traffic law infractions committed in a smart city environment. The experiments, performed on real datasets, demonstrate the efficiency and effectiveness of MGI-CLOUD.
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/2557563
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo