Abstract
We have derived a novel method to assess compositional biases in biological sequences, which is based on finding the lowest-probability subsequences for a given residue-type set. As a case study, the distribution of prion-like glutamine/asparagine-rich ((Q+N)-rich) domains (which are linked to amyloidogenesis) was assessed for budding and fission yeasts and four other eukaryotes. We find more than 170 prion-like (Q+N)-rich regions in budding yeast, and, strikingly, many fewer in fission yeast. Also, some residues, such as tryptophan or isoleucine, are unlikely to form biased regions in any eukaryotic proteome.
MeSH terms
-
Amino Acid Motifs / genetics
-
Amino Acid Sequence / genetics
-
Animals
-
Asparagine* / genetics
-
Caenorhabditis elegans Proteins* / chemistry
-
Caenorhabditis elegans Proteins* / genetics
-
Computational Biology / methods
-
Computational Biology / statistics & numerical data
-
Drosophila Proteins* / chemistry
-
Drosophila Proteins* / genetics
-
Eukaryotic Cells / chemistry
-
Eukaryotic Cells / metabolism
-
Fungal Proteins* / chemistry
-
Fungal Proteins* / genetics
-
Glutamine* / genetics
-
Humans
-
Molecular Sequence Data
-
Prions / chemistry*
-
Prions / genetics
-
Protein Structure, Tertiary / genetics
-
Proteome / genetics*
-
Saccharomycetales / genetics
-
Schizosaccharomyces / genetics
-
Schizosaccharomyces pombe Proteins / chemistry
-
Schizosaccharomyces pombe Proteins / genetics
-
Selection, Genetic*
-
Sequence Analysis, Protein / methods
-
Sequence Analysis, Protein / statistics & numerical data
Substances
-
Caenorhabditis elegans Proteins
-
Drosophila Proteins
-
Fungal Proteins
-
Prions
-
Proteome
-
Schizosaccharomyces pombe Proteins
-
Glutamine
-
Asparagine