SemanTex: Semantic Text Exploration Using Document Links Implied by Conceptual Networks Extracted from the Texts
Refereed Conference Meeting Proceeding
Despite of advances in digital document processing, exploration of implicit rela- tionships within large amounts of textual resources can still be daunting. This is partly due to the ‘black-box’ nature of most current methods for computing links (i.e., similarities) between documents (c.f.,  and ). The methods are mostly based on numeric computational models like vector spaces or probabilis- tic classifiers. Such models may perform well according to standard IR evaluation methodologies, but can be sub-optimal in applications aimed at end users due to the difficulties in interpreting the results and their provenance [3, 1]. Our Semantic Text Exploration prototype (abbreviated as SemanTex) aims at finding implicit links within a corpus of textual resources (such as articles or web pages) and exposing them to users in an intuitive front-end. We discover the links by: (1) finding concepts that are important in the corpus; (2) computing relationships between the concepts; (3) using the relationships for finding links between the texts. The links are annotated with the concepts from which the particular connection was computed. Apart of being presented to human users for manual exploration in the SemanTex interfaces, we are working on repre- senting the semantically annotated links between textual documents in RDF and exposing the resulting datasets for particular domains (such as PubMed or New York Times articles) as a part of the Linked Open Data cloud. In the following we provide more details on the method and give an example of its practical application to browsing of biomedical articles. A video example of a specific SemanTex prototype to be demonstrated at the conference can be looked up at http://www.fujitsu.com/ie/about/videos/research/.
International Semantic Web Conference
Digital Object Identifer (DOI):
National University of Ireland, Galway (NUIG)
Open access repository: