Medical claims database is an important source of data for studying the characteristics, and burden of diseases, to provide a basis for the development of policy on management. The database is usually used to identify patients through International Classification of Diseases and free text-building algorithms, thus it is crucial to validate whether the algorithm is correctly identifing the targeted population. This paper introduces both traditional and emerging validation methods including machine learning, natural language processing and database linkage etc.. We also have tried to present a suitable validation method for the current situation in China, so as to promote the application of big data in medical areas and to provide reference for epidemiology studies, based on medical claims database in this country.
医疗保险数据库蕴藏着丰富的信息,是研究人群疾病特征、疾病负担、提供管理政策制定依据的重要来源。在医保数据库中,通常利用疾病编码和名称构建算法来识别患者,因此,数据库准确性的验证对判断算法是否正确识别所研究疾病或某种暴露因素的人群十分重要。本文介绍国外传统的病历审查方法,并结合机器学习、自然语言处理及数据库链接等新兴辅助验证技术,探讨适合我国现况的验证方法,为促进我国医疗大数据的应用和基于医疗保险数据库开展相关研究提供参考。.
Keywords: Database linkage; Machine learning; Medical claims database; Natural language processing; Validation.