CQUniversity
Browse

File(s) not publicly available

MSD-Kmeans: A hybrid algorithm for efficient detection of global and local outliers

conference contribution
posted on 2024-04-07, 23:55 authored by Yuanyuan Wei, Julian Jang-Jaccard, Fariza SabrinaFariza Sabrina, Timothy McIntosh
Existing outlier detection algorithms exhibit different sensitivity to noisy data such as extreme values. In this paper, we propose a novel cluster-based outlier detection algorithm named MSD-Kmeans that combines the statistical method of Mean and Standard Deviation (MSD) and the machine learning clustering algorithm K-means to detect outliers more accurately with the better control of extreme values. There are two phases in this combination method of MSD-Kmeans: (1) applying MSD algorithm to eliminate as many noisy data to minimize the interference on clusters, and (2) applying K-means algorithm to obtain local optimal clusters. We evaluate our algorithm and demonstrate its effectiveness in the context of detecting possible overcharging of taxi fares. We compare the performance indicators of MSD-Kmeans with those of other outlier detection algorithms both from statistical and machine learning-based methods and prove that the proposed algorithm achieves the highest measure of precision, accuracy, and F-measure. This illustrates that our MSD-Kmeans can be used for effective and efficient outlier detection on data of varying quality on IoT devices.

History

Editor

Hu J; Mumtaz S; Cheng X

Start Page

87

End Page

94

Number of Pages

8

Start Date

2021-10-20

Finish Date

2021-10-22

ISBN-13

9781665400381

Location

Shenyang, China

Publisher

IEEE

Place of Publication

Piscataway, NJ

Peer Reviewed

  • Yes

Open Access

  • No

Era Eligible

  • Yes

Name of Conference

2021 IEEE 15th International Conference on Big Data Science and Engineering (BigDataSE 2021)

Parent Title

Proceedings of the 2021 IEEE 15th International Conference on Big Data Science and Engineering (BigDataSE)