In the present postgenomic era, the capacity to generate big data has far exceeded the capacity to analyze, contextualize, and make sense of the data in clinical, biological, and ecological applications. There is a great unmet need for automation and algorithms to aid in analyses of big data, in biology in particular. In this context, it is noteworthy that computational methods used to analyze the regulation of bacterial gene expression have in the past focused mainly on Escherichia coli promoters due to the large amount of data available. The challenge and prospects of automation in prediction and recognition of bacteria sequences as promoters have not been properly addressed due to the promoter size and degenerate pattern. We report here an original neural network approach for recognition and prediction of Bacillus subtilis promoters. The artificial neural network used as input 767 B. subtilis promoter sequences, while also aiming at identifying the architecture, provides the most optimal prediction. Two multilayer perceptron neural network architectures offered the highest accuracy: one with five, and another with seven neurons in the hidden layer. Each architecture achieved an accuracy of 98.57% and 97.69%, respectively. The results collectively indicate the promise of the application of neural network approaches to the B. subtilis promoter recognition problem, while also suggesting the broader potential of algorithms for automation of data analyses in the postgenomic era.
Keywords: Bacillus subtilis; algorithms; artificial neural network; automation; computational biology; genome annotation.