You are here

One Size Does Not Fit All: Querying Web Polystores

Authors: 

Yasar Khan, Antoine Zimmermann, Alok Kumar, Vijay Gadepally, Mathieu d'Aquin, Ratnesh Sahay

Publication Type: 
Refereed Original Article
Abstract: 
Data retrieval systems are facing a paradigm shift due to the proliferation of specialised data storage engines (SQL, NoSQL, Column Stores, MapReduce, Data Stream, Graph) supported by varied data models (CSV, JSON, RDB, RDF, XML). One immediate consequence of this paradigm shift results into data bottleneck over the Web; which means, Web applications are unable to retrieve data with the intensity at which data is being generated from different facilities. Especially in the genomics and healthcare verticals, data is growing from petascale to exascale and biomedical stakeholders are expecting seamless retrieval of these data over the Web. In this article, we argue that the bottleneck over the Web can be reduced by minimising the costly data conversion process and delegating query performance and processing loads to the specialised data storage engines over their native data models.We propose aWeb-based query federation mechanism – called PolyWeb – that unifies query answering over multiple native data models (CSV, RDB, and RDF). We emphasise two main challenges of query federation over native data models: (i) devise a method to select prospective data sources – with different underlying data models – that can satisfy a given query; and (ii) query optimisation, join and execution over different data models. We demonstrate PolyWeb on a cancer genomics use-case where it is often the case that a description of biological and chemical entities (e.g., gene, disease, drug, pathways) spans across multiple data models and respective storage engines. In order to assess the benefits and limitations of evaluating queries over native data models, we evaluate PolyWeb with state-of-the-art query federation engines in terms of result completeness, source selection, and overall query execution time.
Digital Object Identifer (DOI): 
10.1109/ACCESS.2018.2888601
ISSN: 
2169-3536
Publication Status: 
Published
Date Accepted for Publication: 
Monday, 26 November, 2018
Publication Date: 
17/01/2019
Journal: 
IEEE Access
Volume: 
7
Issue: 
1
Pages: 
9598-9617
Research Group: 
Institution: 
National University of Ireland, Galway (NUIG)
Open access repository: 
Yes
Publication document: