Polymorphisms in regulatory DNA regions are believed to play an important role in determining phenotype, including disease, and in providing raw material for evolution. We devised a new pipeline for the systematic identification of functional variation in human regulatory sequences. The algorithm is based on the identification of SNPs leading to significant changes in both the affinity of a regulatory region for transcription factors (TFs) and the expression in vivo of the regulated gene. We tested the algorithm by identifying SNPs leading to altered regulation by STAT3 in human promoters and introns, and experimentally validated the top-scoring ones, showing that most of the SNPs identified by the algorithm indeed correspond to differential binding of STAT3 and differential induction of the target gene upon stimulation with IL6. Using the same computational approach, we compiled a database of thousands of predicted functional regulatory SNPs for hundreds of human TFs, which we provide as online Supporting Information. We discuss possible applications to the interpretation of noncoding SNPs associated with human diseases. The method we propose and the database of predicted functional cis-regulatory polymorphisms will be useful in future studies of regulatory variation and in particular to interpret the results of past and future genome-wide association studies.
© 2013 Wiley Periodicals, Inc.