CQUniversity
Browse

File(s) not publicly available

CSS: Handling imbalanced data by improved clustering with stratified sampling

journal contribution
posted on 2024-04-22, 02:29 authored by L Cao, Hong Shen
The traditional support vector machine technique (SVM) has drawbacks in dealing with imbalanced data. To address this issue, in this paper we propose an algorithm of improved clustering with stratified sampling technique (CSS) to improve the classification performance of SVMs on imbalanced datasets. Instead of applying a single type of sampling method as used in the literature, our algorithm treats different type of classes with different sampling methods. For minority classes, the algorithm uses oversampling method by adding noise which obeys normal distribution around every support vector to generate new samples. For majority classes, samples are first divided into different clusters by applying first the improved clustering by fast search to find of density peaks (CFSFDP) to obtain latent structure information in each majority class and then stratified sampling method is applied to extract samples from each subcluster of the majority class. Moreover, we further extend this method into an ensemble classifiers that use multiple base SVM classifiers for prediction. The experimental results of classification on several imbalanced classification datasets show that our CSS is more effective than the state-of-the-art sampling methods.

Funding

Category 2 - Other Public Sector Grants Category

History

Volume

34

Issue

2

Start Page

1

End Page

17

Number of Pages

17

eISSN

1532-0634

ISSN

1532-0626

Publisher

Wiley

Language

en

Peer Reviewed

  • Yes

Open Access

  • No

Acceptance Date

2020-10-19

Era Eligible

  • Yes

Journal

Concurrency and Computation: Practice and Experience

Article Number

e6071

Usage metrics

    CQUniversity

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC