Purpose: The American Association for Cancer Research Project Genomics Evidence Neoplasia Information Exchange Biopharma Collaborative is a multi-institution effort to build a pan-cancer repository of genomic and clinical data curated from the electronic health record. For the research community to be confident that data extracted from electronic health record text are reliable, transparency of the approach used to ensure data quality is essential.
Materials and methods: Four institutions participating in AACR's Project GENIE created an observational cohort of patients with cancer for whom tumor molecular profiling data, therapeutic exposures, and treatment outcomes are available and will be shared publicly with the research community. A comprehensive approach to quality assurance included assessments of (1) feasibility of the curation model through pressure test cases; (2) accuracy through programmatic queries and comparison with source data; and (3) reproducibility via double curation and code review.
Results: Assessments of feasibility resulted in critical modifications to the curation directives. Queries and comparison with source data identified errors that were rectified via data correction and curator retraining. Assessment of intercurator reliability indicated a reliable curation model.
Conclusion: The transparent quality assurance processes for the GENIE BPC data ensure that the data can be used for analyses that support clinical decision making and advances in precision oncology.