Computational methods for identifying selenoproteins have been developed rapidly in recent years. However, it is still difficult to identify the open reading frame (ORF) of eukaryotic selenoprotein gene, because the TGA codon for a selenocysteine (Sec) residue in the active center of selenoprotein is traditionally a terminal signal of protein translation. A gene assembly algorithm SelGenAmic has been constructed and presented in this chapter for identifying selenoprotein genes from eukaryotic genomes. A method based on this algorithm was developed to build an optimal TGA-containing-ORF for each TGA in a genome, followed by protein similarity analysis through conserved sequence alignments to screen out selenoprotein genes from these ORFs. This method improved the sensitivity of detecting selenoproteins from a genome due to the design that all TGAs in the genome were investigated for its possibility of decoding as a Sec residue. The method based on the SelGenAmic algorithm is capable of identifying eukaryotic selenoprotein genes from their genomes.
Keywords: Gene assembly algorithm; Selenocysteine; Selenoprotein.