Merging Children's Oncology Group Data with an External Administrative Database Using Indirect Patient Identifiers: A Report from the Children's Oncology Group

PLoS One. 2015 Nov 25;10(11):e0143480. doi: 10.1371/journal.pone.0143480. eCollection 2015.

Abstract

Purpose: Clinical trials data from National Cancer Institute (NCI)-funded cooperative oncology group trials could be enhanced by merging with external data sources. Merging without direct patient identifiers would provide additional patient privacy protections. We sought to develop and validate a matching algorithm that uses only indirect patient identifiers.

Methods: We merged the data from two Phase III Children's Oncology Group (COG) trials for de novo acute myeloid leukemia (AML) with the Pediatric Health Information Systems (PHIS). We developed a stepwise matching algorithm that used indirect identifiers including treatment site, gender, birth year, birth month, enrollment year and enrollment month. Results from the stepwise algorithm were compared against the direct merge method that used date of birth, treatment site, and gender. The indirect merge algorithm was developed on AAML0531 and validated on AAML1031.

Results: Of 415 patients enrolled on the AAML0531 trial at PHIS centers, we successfully matched 378 (91.1%) patients using the indirect stepwise algorithm. Comparison to the direct merge result suggested that 362 (95.7%) matches identified by the indirect merge algorithm were concordant with the direct merge result. When validating the indirect stepwise algorithm using the AAML1031 trial, we successfully matched 157 out of 165 patients (95.2%) and 150 (95.5%) of the indirectly merged matches were concordant with the directly merged matches.

Conclusions: These data demonstrate that patients enrolled on COG clinical trials can be successfully merged with PHIS administrative data using a stepwise algorithm based on indirect patient identifiers. The merged data sets can be used as a platform for comparative effectiveness and cost effectiveness studies.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Algorithms
  • Clinical Trials, Phase III as Topic
  • Databases, Factual*
  • Humans
  • Information Storage and Retrieval*
  • Leukemia, Myeloid, Acute
  • Medical Oncology*
  • Pediatrics*
  • Reproducibility of Results