Piyush Mathur, MD, FCCM
Anesthesiologist & Intensivist
The Cleveland Clinic
Cleveland, Ohio, United States
Disclosure information not submitted.
Ritu Panda, NA
Research Assistant
Indian Institute of Science,Bangalore, United States
Disclosure information not submitted.
Saumya Sinha, NA
Graduate Student
Royal Melbourne Institute of Technology, Melbourne, Australia, United States
Disclosure information not submitted.
smitan Pradhan, NA
Graduate Student
Royal Melbourne Institute of Technology, Melbourne, Australia, United States
Disclosure information not submitted.
Title: Machine Learning Model for Diagnosis of Diabetes amongst the Critically Ill Patients
Introduction: Globally, diabetes is one of the leading causes of mortality and morbidity. In 2017, global incidence, prevalence, death, and disability-adjusted life-years (DALYs) associated with diabetes were 22.9 million, 476.0 million, 1.37million, and 67.9 million, respectively. Delays in and missed diagnosis of diabetes are common and can lead to significant morbidity and mortality amongst the patients. We decided to use the critical care data to screen and predict the diagnosis of diabetes. Availability of this data amongst hospitalized patients presents us with an opportunity to develop screening tools which can then be used to guide appropriate patient care.
Methods: Using data from the first 24 hours of intensive care collected by MIT’s Global Open Source Severity of Illness Score, a retrospective study was performed on 130,157 patient encounters which included features such as patient’s vital signs, laboratory data, APACHE(Acute Physiology and Chronic Health Evaluation) scores, and demographic information. Certain demographic data such as ethnicity and gender were removed from the training dataset before processing to avoid bias in the model. New features combining historical conditions and APACHE score were engineered and used to train a gradient boosted machine learning algorithm.
The model was implemented using LightGBM which provides a high performance, distributed gradient boosting framework. LightGBM is based on the decision-tree algorithms where the set of features are represented as the internal nodes of the tree and the class labels are the leaf nodes. The LGBM explores the best fit for the model leaf-wise by optimising the nodes with larger training errors and randomly sampling from the nodes that have smaller errors.
Results: On the training set, the model achieved an AUC of 0.87337 while on the test dataset, the model was able to achieve an AUC of 0.87425 for the prediction of diabetes from available data.
We selected AUC as an accuracy measure since it has the ability to classify between true positive and false positive rate in which, the higher the AUC, the better the model is able to classify.
Conclusion: Screening tools using machine learning models as developed by us for diabetes can be very accurate and effective in the diagnosis of an underlying condition or to identify at risk- patients.