Background: Codelists are required to extract meaningful information on characteristics and events from routinely collected health data such as electronic health records. Research using routinely collected health data relies on codelists to define study populations and variables, thus, trustworthy codelists are important. Here, we provide a checklist, in the style of commonly used reporting guidelines, to help researchers adhere to best practice in codelist development and sharing.
Methods: Based on a literature search and a workshop with researchers experienced in the use of routinely collected health data, we created a set of recommendations that are 1. broadly applicable to different datasets, research questions, and methods of codelist creation; 2. easy to follow, implement and document by an individual researcher, and 3. fit within a step-by-step process. We then formatted these recommendations into a checklist.
Results: We have created a 10-step checklist, comprising 28 items, with accompanying guidance on each step. The checklist advises on which metadata to provide, how to define a clinical concept, how to identify and evaluate existing codelists, how to create new codelists, and how to review, check, finalise, and publish a created codelist.
Conclusions: Use of the checklist can reassure researchers that best practice was followed during the development of their codelists, increasing trust in research that relies on these codelists and facilitating wider re-use and adaptation by other researchers.
Keywords: checklist; clinical codes; codelists; codesets; electronic health records; reporting guidance; reproducibility; valuesets.
When a person receives many types of health care, such as a doctor registering a diagnosis or prescribing a drug, information is collected in their computer system. This information is often organised in a structured way, so that each piece of information can be assigned a “code”. For example, if a person was diagnosed with type 1 diabetes, this could be recorded with the code E10 from the International classification of diseases, which contains codes on all possible diseases. For type 2 diabetes the code would be E11. To use this information for research, researchers need to define which people they want to study by making a list of all the relevant codes (a “codelist”). For example, to study people with type 1 and 2 diabetes they would need to include E10 and E11 in their codelist. The international classification of diseases coding system includes over 70,000 codes, and other medical dictionaries can include hundreds of thousands of codes. These lists can therefore be long and complex to create. While they are very important in ensuring that research using this data is correct, no step-by-step guidelines exist to help researchers create codelists. To tackle this, we created a checklist and guidance document which researchers can now use to make sure they don’t miss important steps and checks while creating their codelists, and to help them share their codelists so they can be re-used by other researchers. We collected recommendations that other authors have made before us, and developed detailed guidance together with experts in using these types of data for research.
Copyright: © 2024 Matthewman J et al.