Objectives: To assess the completeness and representativeness of body mass index (BMI) data in the Clinical Practice Research Datalink (CPRD), and determine an optimal strategy for their use.
Design: Descriptive study.
Setting: Electronic healthcare records from primary care.
Participants: A million patient random sample from the UK CPRD primary care database, aged ≥16 years.
Primary and secondary outcome measures: BMI completeness in CPRD was evaluated by age, sex and calendar period. CPRD-based summary BMI statistics for each calendar year (2003-2010) were age-standardised and sex-standardised and compared with equivalent statistics from the Health Survey for England (HSE).
Results: BMI completeness increased over calendar time from 37% in 1990-1994 to 77% in 2005-2011, was higher among females and increased with age. When BMI at specific time points was assigned based on the most recent record, calendar-year-specific mean BMI statistics underestimated equivalent HSE statistics by 0.75-1.1 kg/m(2). Restriction to those with a recent (≤3 years) BMI resulted in mean BMI estimates closer to HSE (≤0.28 kg/m(2) underestimation), but excluded up to 47% of patients. An alternative strategy of imputing up-to-date BMI based on modelled changes in BMI over time since the last available record also led to mean BMI estimates that were close to HSE (≤0.37 kg/m(2) underestimation).
Conclusions: Completeness of BMI in CPRD increased over time and varied by age and sex. At a given point in time, a large proportion of the most recent BMIs are unlikely to reflect current BMI; consequent BMI misclassification might be reduced by employing model-based imputation of current BMI.
Keywords: Epidemiology; Primary Care; Statistics & Research Methods.