You are here

Clustering Mixed Data via Latent Variable Models


Damien McPartland, Claire Gormley

Publication Type: 
Refereed Conference Meeting Proceeding
A model based clustering procedure for data of mixed type, termed clustMD, is developed using a latent variable model. It is proposed that a latent variable, following a mixture of Gaussian distributions, generates the observed data of mixed type. The observed data may be any combination of continuous, binary, ordinal or nominal variables. The model employs a parsimonious covariance structure for the latent variables, leading to a suite of six clustering models that vary in complexity and provide an elegant and uni ed approach to clustering mixed data. An expectation maximisation (EM) algorithm is used to estimate the model; in the presence of nominal data a Monte Carlo EM algorithm is required. The clustMD model is illustrated by clustering prostate cancer patients, on whom measurements of mixed type have been recorded.
Conference Name: 
30th International Workshop in Statistical Modelling (IWSM)
Digital Object Identifer (DOI): 
Publication Date: 
Conference Location: 
National University of Ireland, Dublin (UCD)
Open access repository: 
Publication document: