Gene finding is the process of identifying genome sequence regions representing stretches of DNA that encode biologically active products, such as proteins or functional noncoding RNAs. As this is usually the first step in the analysis of any novel genomic sequence or resequenced sample of well-known organisms, it is a very important issue, as all downstream analyses depend on the results. This chapter describes the biological basis for gene finding, and the programs and computational approaches that are available for the automated identification of protein-coding genes. For bacterial, archaeal, and eukaryotic genomes, as well as for multi-species sequence data originating from environmental community studies, the state of the art in automated gene finding is described.
Keywords: Environmental sequence samples; Gene finding; Gene prediction; Genomic sequence; Next-generation sequencing; Protein-coding sequences.