Gene set analysis provides a method to generate statistical inferences across sets of linked genes, primarily using high-throughput expression data. Common gene sets include biological pathways, operons, and targets of transcriptional regulators. In higher eukaryotes, especially when dealing with diseases with strong genetic and epigenetic components such as cancer, copy number loss and gene silencing through promoter methylation can eliminate the possibility that a gene is transcribed. This, in turn, can adversely affect the estimation of transcription factor or pathway activity from a set of target genes, as some of the targets may not be responsive to transcriptional regulation. Here we introduce a simple filtering approach that removes genes from consideration if they show copy number loss or promoter methylation, and demonstrate the improvement in inference of transcription factor activity in a simulated dataset based on the background expression observed in normal head and neck tissue.
Keywords: copy number variations; gene set analysis; promoter methylation; simulated dataset; transcription factor gene sets.
© 2014 Wiley Periodicals, Inc.