CQUniversity
Browse

Intelligent type 2 diabetes risk prediction from administrative claim data

journal contribution
posted on 2024-08-19, 03:12 authored by S Uddin, Tasadduq ImamTasadduq Imam, ME Hossain, Ergun GideErgun Gide, OA Sianaki, MA Moni, AA Mohammed, V Vandana
Type 2 diabetes is a chronic, costly disease and is a serious global population health problem. Yet, the disease is well manageable and preventable if there is an early warning. This study aims to apply supervised machine learning algorithms for developing predictive models for type 2 diabetes using administrative claim data. Following guidelines from the Elixhauser Comorbidity Index, 31 variables were considered. Five supervised machine learning algorithms were used for developing type 2 diabetes prediction models. Principal component analysis was applied to rank variables’ importance in predictive models. Random forest (RF) showed the highest accuracy (85.06%) among the algorithms, closely followed by the k-nearest neighbor (84.48%). The analysis further revealed RF as a high performing algorithm irrespective of data imbalance. As revealed by the principal component analysis, patient age is the most important predictor for type 2 diabetes, followed by a comorbid condition (i.e., solid tumor without metastasis). This study’s finding of RF as the best performing classifier is consistent with the promise of tree-based algorithms for public data in other works. Thus, the outcome can guide in designing automated surveillance of patients at risk of forming diabetes from administrative claim information and will be useful to health regulators and insurers.

History

Volume

47

Issue

3

Start Page

243

End Page

257

Number of Pages

15

eISSN

1753-8165

ISSN

1753-8157

Publisher

Informa UK Limited

Language

en

Peer Reviewed

  • Yes

Open Access

  • No

Era Eligible

  • Yes

Medium

Print-Electronic

Journal

Informatics for Health and Social Care

Usage metrics

    CQUniversity

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC