Genome-wide association studies (GWAS) query the entire genome in a hypothesis-free, unbiased manner. Since they have the potential for identifying novel genetic variants, they have become a very popular approach to the investigation of complex diseases. Nonetheless, since the success of the GWAS approach varies widely, the identification of genetic variants for complex diseases remains a difficult problem. We developed a novel bioinformatics approach to identify the nominal genetic variants associated with complex diseases. To test the feasibility of our approach, we developed a web-based aggregation tool to organize the genes, genetic variations and pathways involved in preterm birth. We used semantic data mining to extract all published articles related to preterm birth. All articles were reviewed by a team of curators. Genes identified from public databases and archives of expression arrays were aggregated with genes curated from the literature. Pathway analysis was used to impute genes from pathways identified in the curations. The curated articles and collected genetic information form a unique resource for investigators interested in preterm birth. The Database for Preterm Birth exemplifies an approach that is generalizable to other disorders for which there is evidence of significant genetic contributions.