Multimodal Retrieval with Diversification and Relevance Feedback for Tourist Attraction Images
Refereed Original Article
In this paper, we present a novel framework that can produce a visual description of a tourist attraction by choosing the most diverse pictures from community-contributed datasets, that describe different details of the queried location. The main strength of the proposed approach is its flexibility that permits to filter out non-relevant images, and to obtain a reliable set of diverse and relevant images by first clustering similar images according to their textual descriptions and their visual content, and then extracting images from different clusters according to a measure of user's credibility. Clustering is based on a two-step process where textual descriptions are used first, and the clusters are then refined according to the visual features. The degree of diversification can be further increased by exploiting users' judgments on the results produced by the proposed algorithm through a novel approach, where users not only provide a relevance feedback, but also a diversity feedback. Experimental results performed on the MediaEval 2015 ``Retrieving Diverse Social Images" dataset show that the proposed framework can achieve very good performance both in the case of automatic retrieval of diverse images, and in the case of the exploitation of the users' feedback. The effectiveness of the proposed approach has been also confirmed by a small case study involving a number of real users.
Digital Object Identifer (DOI):
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM)
Dublin City University (DCU)
Open access repository: