Motivation: An important task in computational biology is to infer, using background knowledge and high-throughput data sources, models of cellular processes such as gene regulation. Nachman et al. have developed an approach to inferring gene-regulatory networks that represents quantitative transcription rates, and simultaneously estimates both the kinetic parameters that govern these rates and the activity levels of unobserved regulators that control them. This approach is appealing in that it provides a more detailed and realistic description of how a gene's regulators influence its level of expression than alternative methods. We have developed an extension to this approach that involves representing and learning the key kinetic parameters as functions of features in the genomic sequence. The primary motivation for our approach is that it provides a more mechanistic representation of the regulatory relationships being modeled.
Results: We evaluate our approach using two Escherichia coli gene-expression data sets, with a particular focus on modeling the networks that are involved in controlling how E.coli regulates its response to the carbon source(s) available to it. Our results indicate that our sequence-based models provide predictive accuracy that is better than similar models without sequence-based parameters, and substantially better than a simple baseline. Moreover, our approach results in models that offer more explanatory power and biological insight than models without sequence-based parameters.