Applying Data Science methods and tools to unveil healthcare use of lung cancer patients in a teaching hospital in Spain

Clin Transl Oncol. 2019 Nov;21(11):1472-1481. doi: 10.1007/s12094-019-02074-2. Epub 2019 Mar 12.

Abstract

Purpose: Our primary goal was to study the use of outpatient attendances by lung cancer patients in Hospital Universitario Puerta de Hierro Majadahonda (HUPHM), Spain, by leveraging our Electronic Patient Record (EPR) and structured clinical registry of lung cancer cases as well as assessing current Data Science methods and tools.

Methods/patients: We applied the Cross-Industry Standard Process for Data Mining (CRISP-DM) to integrate and analyze activity data extracted from the EPR (9.3 million records) and clinical data of lung cancer patients from a previous registry that was curated into a new, structured database based on REDCap. We have described and quantified factors with an influence in outpatient care use from univariate and multivariate points of view (through Poisson and negative binomial regression).

Results: Three cycles of CRISP-DM were performed resulting in a curated database of 522 lung cancer patients with 133 variables which generated 43,197 outpatient visits and tests, 1538 ER visits and 753 inpatient admissions. Stage and ECOG-PS at diagnosis and Charlson Comorbidity Index were major contributors to healthcare use. We also found that the patients' pattern of healthcare use (even before diagnosis), the existence of a history of cancer in first-grade relatives, smoking habits, or even age at diagnosis, could play a relevant role.

Conclusions: Integrating activity data from EPR and clinical structured data from lung cancer patients and applying CRISP-DM has allowed us to describe healthcare use in connection with clinical variables that could be used to plan resources and improve quality of care.

Keywords: CRISP-DM; Data Science; Data mining; Healthcare use; Lung cancer.

MeSH terms

  • Age Factors
  • Ambulatory Care / statistics & numerical data*
  • Analysis of Variance
  • Data Mining / methods*
  • Data Mining / standards
  • Data Science / methods*
  • Databases, Factual / statistics & numerical data
  • Electronic Health Records
  • Emergency Service, Hospital / statistics & numerical data
  • Female
  • Health Services Needs and Demand / statistics & numerical data*
  • Hospitalization / statistics & numerical data
  • Hospitals, Teaching
  • Humans
  • Lung Neoplasms / pathology
  • Lung Neoplasms / therapy*
  • Male
  • Registries
  • Regression Analysis
  • Spain