Dimensionality reduction in data mining: A Copula approach
Refereed Original Article
The recent trends in collecting huge and diverse datasets have created a great challenge in data analysis. One of the characteristics of these gigantic datasets is that they often have significant amounts of redundancies. The use of very large multi-dimensional data will result in more noise, redundant data, and the possibility of unconnected data entities. To efficiently manipulate data represented in a highdimensional space and to address the impact of redundant dimensions on the final results, we propose a new technique for the dimensionality reduction using Copulas and the LU-decomposition (Forward Substitution) method. The proposed method is compared favorably with existing approaches on real-world datasets: Diabetes, Waveform, two versions of Human Activity Recognition based on Smartphone, and Thyroid Datasets taken from machine learning repository in terms of dimensionality reduction and effi- ciency of the method, which are performed on statistical and classification measures.
Digital Object Identifer (DOI):
Expert Systems with Applications
National University of Ireland, Dublin (UCD)
Open access repository: