Standardizing Phenotype Variables in the Database of Genotypes and phenotypes (dbGaP) based on Information Models

AMIA Jt Summits Transl Sci Proc. 2013 Mar 18:2013:110. eCollection 2013.

Abstract

This paper describes an information model based approach to standardizing phenotype variables in dbGaP. Our attempt to utilize existing information models of Clinical Element Models (CEM) was not successful although CEM provided a robust means of representing clinical data. Thus, we developed information models derived from phenotype variable descriptions and standardized phenotype variables by fitting them into the models using a simple Natural Language Processing (NLP) algorithm. We report the experience of standardizing findings related variables, which tend to be more idiosyncratic thus pose more challenges to standardization, using this approach.