Close this panel
Browse By Date
Browse By Poster Author
Browse By Title
Close this panel



Chaitanya Mamillapalli, MD,MRCP,FAPCR – Endocrinologist, SPRINGFIELD CLINIC

Shaun C. Tonstad – President, Clarion Group, Inc.

Daniel J. Fox, MPH, PhD – Director of Clinical Research, Director

Geoffrey W. Rutledge, MD, PhD – Chief Medical Officer, HealthTap

David B. Graham, MD – Senior Vice President and Chief Information Officer/Chief Medical Information Officer, Memorial Health System

Michael Jakoby, IV, MD/MA – Associate Professor of Medicine and Chief, Division of Endocrinology, SIU School of Medicine

Chaitanya Mamillapalli, MD,MRCP,FAPCR – Endocrinologist, SPRINGFIELD CLINIC


Objective :

The prevalence of undiagnosed type 2 diabetes mellitus is high, with estimates of over 7 million patients in the United States and nearly 175 million cases globally.  This study evaluated a machine-learning model for screening electronic health records (EHR) data to identify potential patients with undiagnosed type 2 diabetes mellitus. 

Methods :

A supervised jungle binary classifier machine-learning model was created using the most recently reported de-identified patient data, excluding glycemic measures, sourced from a large multi-specialty clinic’s EHR.  Patient data was segmented (50% spit) into training and validation datasets.  The training dataset was used to train the model to identify patients at high risk of type 2 diabetes mellitus (T2DM) from nine EHR measures: 1. Age, 2. Gender, 3. Race, 4. Body Mass Index, 5. Blood Pressure, 6. Creatinine, 7. Triglycerides, 8. Family History of Diabetes, and 9. Tobacco Use.  Using a two-class decision jungle algorithm, each patient was assessed against a binary diagnosis scenario (diabetic or not diabetic), with diagnosis of T2DM defined as a random glucose > 140 mg/dL and/or HbA1c > 6.5%. The validation set was then used to determine the model’s predictive accuracy.

Results :

The initial sample size totaled 618,022 subjects. Incomplete data resulted in exclusion of 532,303 subjects. The remaining 85,719 subjects were segmented into equal training and validation datasets. After training the model on the training dataset, T2DM was identified in the validation dataset with positive predictive value (precision) of 0.686 and negative predictive value (recall) of 0.65.  Area under the curve (AUC) and F-score for predictive accuracy were 0.72 and 0.77, respectively.     

Discussion :

This study demonstrates the feasibility of using a machine-learning assessment tool to identify patients at high risk of T2DM from evaluation of demographic, clinical, and non-glycemic laboratory parameters commonly found in electronic health records.  After additional work to refine the model and improve predictive accuracy, this machine-learning tool may prove valuable in helping clinical practices screen large patient populations to identify individuals in need of diabetes screening. 

Conclusion :

Initial results indicate that a jungle binary classifier machine-learning model can be developed to create a screening tool to accurately identify patients at high risk of undiagnosed T2DM who would benefit from glycemic screening.   

Rate This Poster

Stuff for notes
Stuff for Message board

Share Poster


Technical Support

(877) 426-6323

[email protected]


SUBMIT FEEDBACKfeedback icon

We really appreciate your feedback on the eventScribe website. We use the data to improve the experience and simplify the process for users like you.


Log In / Sign Up

Already have an Event Scheduler or mobile app login? Login with those details. If not, create a login.

Log In   Sign Up
Access your bookmarked poster and notes by logging in ...   Sign up to take notes on poster, bookmark poster, and submit feedback.
  Lost your access key?      
You need to be logged in to bookmark posters, save notes, or rate posters.