Electronic health records and administrative databases provide rich, longitudinal data for health-related research. These data cover large, diverse populations creating excellent research opportunities, but have limitations. In particular, information is available only for individuals who are enrolled in a particular health system; thus, studies often exclude individual's with short enrollment history. Such cohort restriction may cause selection bias in absolute risk estimates for the full enrollee population. We use hazard ratios (HRs) to estimate the association between length of prior enrollment and cancer and all-cause mortality risk. HRs different from one indicate restricted cohorts would produce biased risk estimates for the full enrollee population. Our study sample included 170,708 enrollees of a Western Washington healthcare delivery system. Unadjusted models found individuals with 10 or more years of prior enrollment had higher risk of cancer and death compared to those with less than 5 years prior enrollment (HRs ranged from 1.29 - 3.01). Age- and sex-adjusted models accounted for much of this difference (HRs: 0.93 - 1.24). Models adjusting for additional covariates had similar results (HRs: 0.91 - 1.14). After evaluating potential selection bias, we conclude that, in this setting, age- and sex-standardizing risk estimates can remove most of the bias due to lengthy, prior-enrollment cohort restrictions. Before generalizing estimates based on a selected sample of patients meeting prior enrollment criteria, researchers should assess the potential for selection bias.
Keywords: Selection bias; administrative databases; cancer mortality risk estimation; electronic health records; enrollment history.