InsightInsight
sfi
  • About
    • What We Do
    • Governance
    • Equality, Diversity and Inclusion
  • People
    • Work With Us
    • Senior Leadership
    • Principal Investigators
    • Funded Investigators
    • Research and Operations
  • Research
    • Ecosystem
    • Excellence
    • Funding Collaboration
    • National Projects
    • European Projects
    • Publications
  • Industry
    • Engage
    • Collaborate
    • Contact
  • Public Engagement
    • Meet the Team
    • Highlights
  • News
    • Spotlight on Research
    • Newsletter
    • Press Releases
  • Contact
  • About
    • What We Do
    • Governance
    • Equality, Diversity and Inclusion
  • People
    • Work With Us
    • Senior Leadership
    • Principal Investigators
    • Funded Investigators
    • Research and Operations
  • Research
    • Ecosystem
    • Excellence
    • Funding Collaboration
    • National Projects
    • European Projects
    • Publications
  • Industry
    • Engage
    • Collaborate
    • Contact
  • Public Engagement
    • Meet the Team
    • Highlights
  • News
    • Spotlight on Research
    • Newsletter
    • Press Releases
  • Contact

Unsupervised Classification of Health Content on Reddit

Insight>Publications>Unsupervised Classification of Health Content on Reddit

Authors:

Joana Barros, Paul Buitelaar, Jim Duggan, Dietrich Rebholz-Schuhmann

Publication Type:

Refereed Conference Meeting Proceeding

Abstract:

Online forums are easily accessible to the public and useful to acquire and disseminate health information, however, advanced methods have to be applied to correctly interpret the content. For this reason, we propose the application of an unsupervised embedding-based approach for health content classification. Specifically, we utilise word embeddings and a clustering method to create content-sensitive word clusters; we then align the health content with the clusters classifying it into illnesses/medication/disease agents. The results suggest that a cosine similarity of 0.70 is preferred for the creation of informative clusters as well as for the automatic generation of synonyms, acronyms, abbreviations and common misspellings. Our approach does not only demonstrate the potential given by discussion forums, in particular, Reddit, for unsupervised content classification but also for dictionary building from informal health content.

Conference Name:

International Digital Public Health Conference

Proceedings:

9th International Digital Public Health Conference

Digital Object Identifer (DOI):

10.1145/3357729.3357745

Publication Date:

23/11/2019

Conference Location:

France

Research Group:

Linked Data

Institution:

National University of Ireland, Galway (NUIG)

Open access repository:

No

footer-top
  • Privacy Statement
  • Copyright Statement
  • Data Protection Notice