Hyojun Ada Lee
Senior Data Scientist
S&P Global Corporate, United States
Disclosure information not submitted.
Heliodoro Tejedor Navarro, n/a
Senior Developer
Northwestern Institute on Complex Systems, United States
Disclosure information not submitted.
Olivia Keaveny
Research Manager
NorthShore University HealthSystem, United States
Disclosure information not submitted.
Curtis Weiss
Assistant Professor
NorthShore University HealthSystem, United States
Disclosure information not submitted.
Title: Machine learning to characterize chest imaging reports consistent with ARDS
Introduction: Acute respiratory distress syndrome (ARDS) can be difficult to recognize and delays in recognition can be deadly. A pillar of ARDS diagnosis is bilateral infiltrates on chest imaging. However, diverse language can be utilized to report this and so classification with traditional keyword matching can be insufficient. Here, we develop a machine learning (ML) model to automatically characterize imaging reports as consistent or inconsistent with ARDS.
Methods: We studied three corpora of chest imaging reports from intubated adults with acute hypoxemic respiratory failure. Corpora 1 and 2 were from the same medical system from 2013 and 2016, respectively, while Corpus 3 was from a different medical center from 2017. Imaging reports were characterized by trained research personnel as consistent with ARDS or not. We preprocessed reports to remove patient information and non-informative sections, then tokenized and vectorized the text. Four different ML models (decision trees, logistic regression, random forest classifiers, and extreme gradient boosting ‘XGBoost’) were run with Corpus 1, with an 80/20% training/validation split. The XGBoost model trained with Corpus 1 was tested on Corpus 2, and vice versa, then both were tested on Corpus 3.
Results: Corpus 1 comprised 5783 records from 989 patients with 58% consistent with ARDS, Corpus 2 comprised 6041 records from 749 patients with 44% potentially consistent with ARDS, and Corpus 3 comprised 629 records from 90 patients with 34% consistent with ARDS. Area Under the Receiver Operator Curve (AUROCs) for models on test sets from Corpus 1 were 0.81 for decision tress, 0.86 for logistic regression, 0.92 for random forest classifiers with depth of 10, and 0.93 for extreme gradient boosting. The AUROC was 0.90 for the Corpus 1 trained XGBoost model run on the Corpus 2 reports and 0.88 vice versa. We further tested both Corpora 1 and 2 XGBoost models on Corpus 3, which was from a different hospital system, with AUROCs of 0.85 and 0.91, respectively.
Conclusion: ML can accurately identify chest imaging reports that are consistent with ARDS. Models derived from one hospital system can still accurately identify reports consistent with ARDS from an entirely different hospital system. This has potential to help with rapid recognition of ARDS and improve patient care.