Cataloguing and Linking Life Sciences LOD
Refereed Conference Meeting Proceeding
The Life Sciences Linked Open Data (LSLOD) Cloud is currently comprised of multiple datasets that add high value to biomedical research. The ability to navigate through these datasets in order to derive and discover new meaningful biological correlations is considered one of the most significant resources for supporting clinical decision making . However, navigating these multiple datasets is not easy as most of them are fragmented across multiple SPARQL endpoints, each containing trillions of triples and represented with insufficient vocabulary reuse. To retrieve and match, from multiple endpoints, the data required to answer meaningful biological questions, it is first necessary to catalogue the data represented in each endpoint, in order to understand how powerful queries traversing several SPARQL endpoints can be assembled. In this report, we explore the schema used to represent data from a total of 52 meaningful Life Sciences SPARQL endpoints and present our methodology for linking related concepts and properties from the “pool” of available elements. We found the outcome of this exploratory work not only to be helpful in identifying redundancy and gaps in the data, but also for enabling the assembly of complex federated queries. In this report we present three different approaches used to weave concepts and properties and discuss their applicability for creating complex links in the LSLOD cloud. Keywords: Linked Open Data, SPARQL, Life Sciences, Query Element .
1st International Workshop on Ontology Engineering in a Data-driven World (OEDW 2012) collocated with 8th International Conference on Knowledge Engineering and Knowledge Management (EKAW 2012). 2012.
Digital Object Identifer (DOI):
National University of Ireland, Galway (NUIG)
Open access repository: