Clustering Mixed Data via Latent Variable Models
Refereed Conference Meeting Proceeding
A model based clustering procedure for data of mixed type, termed clustMD, is developed using a latent variable model. It is proposed that a latent variable, following a mixture of Gaussian distributions, generates the observed data of mixed type. The observed data may be any combination of continuous, binary, ordinal or nominal variables. The model employs a parsimonious covariance structure for the latent variables, leading to a suite of six clustering models that vary in complexity and provide an elegant and unied approach to clustering mixed data. An expectation maximisation (EM) algorithm is used to estimate the model; in the presence of nominal data a Monte Carlo EM algorithm is required. The clustMD model is illustrated by clustering prostate cancer patients, on whom measurements of mixed type have been recorded.
30th International Workshop in Statistical Modelling (IWSM)
Digital Object Identifer (DOI):
National University of Ireland, Dublin (UCD)
Open access repository: