A Systematic Literature Review (SLR) identifies, evaluates, and synthesizes the literature available for a given topic. This generally requires a significant human workload and has subjectivity bias that could affect the results of such a review. Automated document classification can be a valuable tool for recommending the selection of studies. In this article, we propose an automated pre-selection approach based on text mining and semantic enrichment techniques. Each document is firstly processed by a named entity extractor. The DBpedia URIs coming from the entity linking process are used as external sources of information. Our system collects the bag of words of those sources and it adds them to the initial document. A Multinomial Naive Bayes classifier discriminates whether the enriched document belongs to the positive example set or not. We used an existing manually performed SLR as benchmark data set. We trained our system with different configurations of relevant documents and we tested the goodness of our approach with an empirical assessment. Results show a reduction of the manual workload of 18% that a human researcher has to spend, while holding a remarkable 95% of recall, important condition for the nature itself of SLRs. We measure the effect of the enrichment process to the precision of the classifier and we observed a gain up to 5%.

Semantic Enrichment for Recommendation of Primary Studies in a Systematic Literature Review / Rizzo, Giuseppe; Tomassetti, FEDERICO CESARE ARGENTINO; Vetro', Antonio; Ardito, Luca; Torchiano, Marco; Morisio, Maurizio; Troncy, Raphael. - In: DIGITAL SCHOLARSHIP IN THE HUMANITIES. - ISSN 2055-7671. - STAMPA. - 32:1(2017), pp. 195-208. [10.1093/llc/fqv031]

Semantic Enrichment for Recommendation of Primary Studies in a Systematic Literature Review

RIZZO, GIUSEPPE;TOMASSETTI, FEDERICO CESARE ARGENTINO;VETRO', ANTONIO;ARDITO, LUCA;TORCHIANO, MARCO;MORISIO, MAURIZIO;
2017

Abstract

A Systematic Literature Review (SLR) identifies, evaluates, and synthesizes the literature available for a given topic. This generally requires a significant human workload and has subjectivity bias that could affect the results of such a review. Automated document classification can be a valuable tool for recommending the selection of studies. In this article, we propose an automated pre-selection approach based on text mining and semantic enrichment techniques. Each document is firstly processed by a named entity extractor. The DBpedia URIs coming from the entity linking process are used as external sources of information. Our system collects the bag of words of those sources and it adds them to the initial document. A Multinomial Naive Bayes classifier discriminates whether the enriched document belongs to the positive example set or not. We used an existing manually performed SLR as benchmark data set. We trained our system with different configurations of relevant documents and we tested the goodness of our approach with an empirical assessment. Results show a reduction of the manual workload of 18% that a human researcher has to spend, while holding a remarkable 95% of recall, important condition for the nature itself of SLRs. We measure the effect of the enrichment process to the precision of the classifier and we observed a gain up to 5%.
File in questo prodotto:
File Dimensione Formato  
semanticreview.pdf

Open Access dal 14/08/2017

Descrizione: Post print articolo
Tipologia: 2. Post-print / Author's Accepted Manuscript
Licenza: PUBBLICO - Tutti i diritti riservati
Dimensione 712.07 kB
Formato Adobe PDF
712.07 kB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/2617310
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo