Missing Data Analysis Using Multiple Imputation in Relation to Parkinson’s Disease
Refereed Conference Meeting Proceeding
Missing data is an omnipresent problem in neurological control diseases, such as Parkinson's Disease. Statistical analyses on the level of Parkinson's Disease may be not accurate, if no adequate method for handling missing data is applied. In order to determine a useful way to treat missing data on Parkinson's stage, we propose a multiple imputation method based on the theory of Copulas in the data pre-processing phase of the data mining process. Our goal to use the theory of Copulas is to estimate the multivariate joint probability distribution without constraints of specic types of marginal distributions of random variables that represent the dimensions of our datasets. To evaluate the proposed approach, we have compared our algorithm with seven state-of-the-art imputation methods such as mean, regression, min, max, K-nearest neighbors, Markov Chain Monte Carlo, Expected Maximization methods, on the basis of six dataset cases containing 5%, 15%, 25% , 35%, 45% and 50% missing data percentages, respectively. The accuracy of each imputation method was evaluated using the Root Mean Square Error (RMSE) formula. Our results indicate that the proposed method outperforms signicantly the existing algorithms.
BDAW’16, At Blagoevgrad,Bulgaria
Digital Object Identifer (DOI):
National University of Ireland, Dublin (UCD)
Open access repository: