Associative classification is a promising technique to build accurate classifiers. However, in large or correlated data sets, association rule mining may yield huge rule sets. Hence, several pruning techniques have been proposed to select a small subset of high-quality rules. Since the availability of a “rich” rule set may improve the accuracy of the classifier, we argue that rule pruning should be reduced to a minimum. The L3 associative classifier is built by means of a lazy pruning technique that discards exclusively rules that only misclassify training data. The classification of unlabeled data is performed in two steps. A small subset of high-quality rules is first considered. When this set is not able to classify the data, a larger rule set is exploited. This second set includes rules usually discarded by previous approaches. To cope with the need of mining large rule sets and to efficiently use them for classification, a compact form is proposed to represent a complete rule set in a space-efficient way and without information loss. An extensive experimental evaluation on real and synthetic data sets shows that L3 improves the classification accuracy with respect to previous approaches.

A Lazy Approach to Associative classification / Baralis, ELENA MARIA; Chiusano, SILVIA ANNA; Garza, Paolo. - In: IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING. - ISSN 1041-4347. - STAMPA. - 20(2):(2008), pp. 156-171. [10.1109/TKDE.2007.190677]

A Lazy Approach to Associative classification

BARALIS, ELENA MARIA;CHIUSANO, SILVIA ANNA;GARZA, PAOLO
2008

Abstract

Associative classification is a promising technique to build accurate classifiers. However, in large or correlated data sets, association rule mining may yield huge rule sets. Hence, several pruning techniques have been proposed to select a small subset of high-quality rules. Since the availability of a “rich” rule set may improve the accuracy of the classifier, we argue that rule pruning should be reduced to a minimum. The L3 associative classifier is built by means of a lazy pruning technique that discards exclusively rules that only misclassify training data. The classification of unlabeled data is performed in two steps. A small subset of high-quality rules is first considered. When this set is not able to classify the data, a larger rule set is exploited. This second set includes rules usually discarded by previous approaches. To cope with the need of mining large rule sets and to efficiently use them for classification, a compact form is proposed to represent a complete rule set in a space-efficient way and without information loss. An extensive experimental evaluation on real and synthetic data sets shows that L3 improves the classification accuracy with respect to previous approaches.
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/1648909
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo