Optimal clusters generation for maximizing ensemble classifier performance

Jan, Muhammad Zohaib; Verma, Brijesh

Optimal clusters generation for maximizing ensemble classifier performance

conference contribution

posted on 2021-06-24, 23:33 authored by Muhammad Zohaib Jan, Brijesh Verma

Clustering based ensemble classifiers have seen a lot of focus recently because of their ability to effectively classify real-world noisy datasets. One way of incorporating clustering in ensembles is to utilize clustering algorithms such as k-means to generate a pool of data clusters. This is done to generate a random subspace on which base classifiers are trained, as opposed to bagging. One key parameter to clustering algorithms is the number of clusters i.e. the number of groups the data should be partitioned into; this is commonly known as the variable K. Most of the existing approaches either determine the value of K through trial and error or use some derived formulae-based approach. The problem firstly is that using a static value of K for different datasets is not ideal and although a certain value may work well for one dataset it may not work well for others. Secondly, calculating the value of K using a formulae-based approach using the raw data is not effective either as an unbalanced data can have a negative effect on the derived value. Therefore, in this paper we first segregate the data based on the data classes and then on each data class we perform a Silhouette analysis to determine the optimal number of clusters each data class should be separated into. The generated clusters which are class pure and are balanced by adding samples from other classes that are closest to the cluster centroid. In this manner we generate a random subspace of an augmented data that is composed of class balanced data clusters. On all balanced data clusters, a diverse set of base classifiers is trained, and an ensemble is formed. The proposed ensemble approach is tested on 16 benchmark UCI datasets and results are compared with single classifiers, as well as state-of-the-art ensemble classifier approaches. A set of non-parametric tests are also adopted to further validate the efficacy of the results.

Funding

Category 1 - Australian Competitive Grants (this includes ARC, NHMRC)

History

Start Page

1

End Page

6

Number of Pages

6

Start Date

2020-07-19

Finish Date

2020-07-24

ISBN-13

9781728169262

Location

Glasgow, UK

Publisher

IEEE

Place of Publication

Piscataway, NJ

Publisher DOI

https://dx.doi.org/10.1109/IJCNN48605.2020.9207157

Full Text URL

https://ieeexplore.ieee.org/document/9207157

Peer Reviewed

Yes

Open Access

No

Author Research Institute

Centre for Intelligent Systems

Era Eligible

Yes

Name of Conference

International Joint Conference on Neural Networks (IJCNN 2020)

Parent Title

Proceedings of the 2020 International Joint Conference on Neural Networks (IJCNN)

Usage metrics

Keywords

Clustering Ensemble classifiers Machine learning Neural, Evolutionary and Fuzzy Computation Pattern Recognition and Data Mining

Licence

CQUniversity General 1.0

Optimal clusters generation for maximizing ensemble classifier performance

Funding

Category 1 - Australian Competitive Grants (this includes ARC, NHMRC)

History

Start Page

End Page

Number of Pages

Start Date

Finish Date

ISBN-13

Location

Publisher

Place of Publication

Publisher DOI

Full Text URL

Peer Reviewed

Open Access

Author Research Institute

Era Eligible

Name of Conference

Parent Title

Usage metrics

Categories

Keywords

Licence

Exports