posted on 2017-12-06, 00:00authored byA B M Shawkat Ali
Clustering technique in data mining has received a significant amount of attention from machine learning community in the last few years as one of the fundamental research areas. Among the vast range of clustering algorithms, K-means is one of the most popular clustering algorithm. In this research we extend K-means algorithm by adding well known radial basis function (rbf) kernel and find better performance than classical K-means algorithm. It is a critical issue for rbf kernel, how can we select a unique parameter for optimum clustering task. This present chapter will provide a statistical based solution on this issue. The best parameter selection is considered on the basis of prior information of the data by Maximum Likelihood (ML) method and Nelder-Mead (N-M) simplex method. A rule based meta-learning approach is then proposed for automatic rbf kernel parameter selection.We consider 112 upervised data set and measure the statistical data characteristics using basic statistics, central tendency measure and entropy based approach. We split this data characteristics using well known decision tree approach to generate the rules. Finally we use the generated rules to select the unique parameter value for rbf kernel and then adopt in K-means algorithm. The experiment has been demonstrated with 112 problems and 10 fold cross validation methods. Finally the proposed algorithm can solve any clustering task very quickly with optimum performance.
Funding
Category 1 - Australian Competitive Grants (this includes ARC, NHMRC)