CQUniversity
Browse
- No file added yet -

Sampling-based machine learning models for intrusion detection in imbalanced dataset

Download (341.65 kB)
journal contribution
posted on 2024-05-29, 20:58 authored by Zongwen Fan, Shaleeza Sohail, Fariza SabrinaFariza Sabrina, Xin Gu
Cybersecurity is one of the important considerations when adopting IoT devices in smart applications. Even though a huge volume of data is available, data related to attacks are generally in a significantly smaller proportion. Although machine learning models have been successfully applied for detecting security attacks on smart applications, their performance is affected by the problem of such data imbalance. In this case, the prediction model is preferable to the majority class, while the performance for predicting the minority class is poor. To address such problems, we apply two oversampling techniques and two undersampling techniques to balance the data in different categories. To verify their performance, five machine learning models, namely the decision tree, multi-layer perception, random forest, XGBoost, and CatBoost, are used in the experiments based on the grid search with 10-fold cross-validation for parameter tuning. The results show that both the oversampling and undersampling techniques can improve the performance of the prediction models used. Based on the results, the XGBoost model based on the SMOTE has the best performance in terms of accuracy at 75%, weighted average precision at 82%, weighted average recall at 75%, weighted average F1 score at 78%, and Matthews correlation coefficient at 72%. This indicates that this oversampling technique is effective for multi-attack prediction under a data imbalance scenario.

History

Volume

13

Issue

10

Start Page

1

End Page

19

Number of Pages

19

eISSN

2079-9292

Publisher

MDPI AG

Additional Rights

CC BY 4.0 DEED

Language

en

Peer Reviewed

  • Yes

Open Access

  • Yes

Acceptance Date

2024-05-01

Era Eligible

  • Yes

Journal

Electronics

Article Number

1878

Usage metrics

    CQUniversity

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC