Given the challenge that healthcare related data are being obtained from various sources and in divergent formats there is an emerging need for providing improved and automated techniques and technologies that perform qualification and standardization of these data. The approach presented in this paper introduces a novel mechanism for the cleaning, qualification, and standardization of the collected primary and secondary data types. The latter is realized through the design and implementation of three (3) integrated subcomponents, the Data Cleaner, the Data Qualifier, and the Data Harmonizer that are further evaluated by performing data cleaning, qualification, and harmonization on top of data related to Pancreatic Cancer to further develop enhanced personalized risk assessment and recommendations to individuals.
Keywords: Data Qualification; Data Standardization; Healthcare Analytics.