CQUniversity
Browse

Removing bias from diverse data clusters for ensemble classification

conference contribution
posted on 2018-05-22, 00:00 authored by Samuel Fletcher, Brijesh Verma
Diversity plays an important role in successful ensemble classification. One way to diversify the base-classifiers in an ensemble classifier is to diversify the data they are trained on. Sampling techniques such as bagging have been used for this task in the past, however we argue that since they maintain the global distribution, they do not engender diversity. We instead make a principled argument for the use of k-Means clustering to create diversity. When creating multiple clusterings with multiple k values, there is a risk of different clusterings discovering the same clusters, which would then train the same base-classifiers. This would bias the ensemble voting process. We propose a new approach that uses the Jaccard Index to detect and remove similar clusters before training the base-classifiers, reducing classification error by removing repeated votes. We demonstrate the effectiveness of our proposed approach by comparing it to three state-of-the-art ensemble algorithms on eight UCI datasets.

Funding

Category 1 - Australian Competitive Grants (this includes ARC, NHMRC)

History

Editor

Liu D; Xie S; Li Y; El-Alfy EM

Volume

LNCS 10637

Start Page

140

End Page

149

Number of Pages

10

Start Date

2017-11-14

Finish Date

2017-11-18

eISSN

1611-3349

ISSN

0302-9743

ISBN-13

9783319700922

Location

Guangzhou, China

Publisher

Springer

Place of Publication

Cham, Germany

Peer Reviewed

  • Yes

Open Access

  • No

Author Research Institute

  • Centre for Intelligent Systems

Era Eligible

  • Yes

Name of Conference

24th International Conference on Neural Information Processing (ICONIP 2017)