CQUniversity
Browse

Optimal clusters generation for maximizing ensemble classifier performance

conference contribution
posted on 2021-06-24, 23:33 authored by Muhammad Zohaib Jan, Brijesh Verma
Clustering based ensemble classifiers have seen a lot of focus recently because of their ability to effectively classify real-world noisy datasets. One way of incorporating clustering in ensembles is to utilize clustering algorithms such as k-means to generate a pool of data clusters. This is done to generate a random subspace on which base classifiers are trained, as opposed to bagging. One key parameter to clustering algorithms is the number of clusters i.e. the number of groups the data should be partitioned into; this is commonly known as the variable K. Most of the existing approaches either determine the value of K through trial and error or use some derived formulae-based approach. The problem firstly is that using a static value of K for different datasets is not ideal and although a certain value may work well for one dataset it may not work well for others. Secondly, calculating the value of K using a formulae-based approach using the raw data is not effective either as an unbalanced data can have a negative effect on the derived value. Therefore, in this paper we first segregate the data based on the data classes and then on each data class we perform a Silhouette analysis to determine the optimal number of clusters each data class should be separated into. The generated clusters which are class pure and are balanced by adding samples from other classes that are closest to the cluster centroid. In this manner we generate a random subspace of an augmented data that is composed of class balanced data clusters. On all balanced data clusters, a diverse set of base classifiers is trained, and an ensemble is formed. The proposed ensemble approach is tested on 16 benchmark UCI datasets and results are compared with single classifiers, as well as state-of-the-art ensemble classifier approaches. A set of non-parametric tests are also adopted to further validate the efficacy of the results.

Funding

Category 1 - Australian Competitive Grants (this includes ARC, NHMRC)

History

Start Page

1

End Page

6

Number of Pages

6

Start Date

2020-07-19

Finish Date

2020-07-24

ISBN-13

9781728169262

Location

Glasgow, UK

Publisher

IEEE

Place of Publication

Piscataway, NJ

Peer Reviewed

  • Yes

Open Access

  • No

Author Research Institute

  • Centre for Intelligent Systems

Era Eligible

  • Yes

Name of Conference

International Joint Conference on Neural Networks (IJCNN 2020)

Parent Title

Proceedings of the 2020 International Joint Conference on Neural Networks (IJCNN)