The availability of annotated multimedia contents is a crucial requirement for a number of applications. In the context of education it could support the automatic summarization of recorded lessons or the retrieval of learning material. In the field of entertainment, it could serve to recommend audio and video resources based on user’s attitudes. In this work, a framework supporting video viewing experience augmentation on mobile devices by means of image- and text-based annotations extracted on-demand from Wikipedia is presented. Speech recognition is exploited to periodically get text snaps from the audio track of the video currently displayed on the mobile device, while query-by-images is used to generate a text summary of extracted video frames. Keywords obtained are treated by semantic techniques to find named entities associated with the multimedia contents, which are then superimposed to the video and displayed to the user in a synchronized way. Promising results obtained with a prototype implementation showed the feasibility of the proposed solution, which could be possibly combined with other systems, e.g., providing information about user’s location, preferences, etc. to build up more sophisticated context-aware applications.

An audio and image-based on-demand content annotation framework for augmenting the video viewing experience on mobile devices / Gatteschi, Valentina; Lamberti, Fabrizio; Sanna, Andrea; Demartini, Claudio Giovanni. - STAMPA. - (2015), pp. 468-472. (Intervento presentato al convegno IEEE 4th International Conference on Mobile Services tenutosi a New York, USA nel June 27-July 2, 2015) [10.1109/MobServ.2015.71].

An audio and image-based on-demand content annotation framework for augmenting the video viewing experience on mobile devices

GATTESCHI, VALENTINA;LAMBERTI, FABRIZIO;SANNA, Andrea;DEMARTINI, Claudio Giovanni
2015

Abstract

The availability of annotated multimedia contents is a crucial requirement for a number of applications. In the context of education it could support the automatic summarization of recorded lessons or the retrieval of learning material. In the field of entertainment, it could serve to recommend audio and video resources based on user’s attitudes. In this work, a framework supporting video viewing experience augmentation on mobile devices by means of image- and text-based annotations extracted on-demand from Wikipedia is presented. Speech recognition is exploited to periodically get text snaps from the audio track of the video currently displayed on the mobile device, while query-by-images is used to generate a text summary of extracted video frames. Keywords obtained are treated by semantic techniques to find named entities associated with the multimedia contents, which are then superimposed to the video and displayed to the user in a synchronized way. Promising results obtained with a prototype implementation showed the feasibility of the proposed solution, which could be possibly combined with other systems, e.g., providing information about user’s location, preferences, etc. to build up more sophisticated context-aware applications.
2015
978-1-4673-7283-1
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/2604565
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo