Aims: To analyse the similarities between the Twisted gastrulation (TSG) proteins known to date; in addition, to determine phylogenetic relations among the TSG proteins, and between the TSGs and other protein families--the CCN (for example, CCN2 (CTGF), CCN1 (CYR61), and CCN3 (NOV)) and IGFBP (insulin-like growth factor binding protein) families.
Methods: TBLASTN and FASTA3 were used to identify new tsg genes and relatives of the TSG family. The sequences were aligned with ClustalW. The predictions of sites for signal peptide cleavage, post-translational modifications, and putative protein domains were carried out with software available at various databases. Unrooted phylogenetic trees were calculated using the UPGMA method.
Results: Several tsg genes from vertebrates and invertebrates were compared. Alignment of protein sequences revealed a highly conserved family of TSG proteins present in both vertebrates and invertebrates, whereas the slightly less well conserved IGFBP and CCN proteins are apparently present only in vertebrates. The TSG proteins display strong homology among themselves and they are composed of a putative signal peptide at the N-terminus followed by a cysteine rich (CR) region, a conserved domain devoid of cysteines, a variable midregion, and a C-terminal CR region. The most striking similarity between the TSGs and the IGFBP and CCN proteins occurs in the N-terminal conserved cysteine rich domain and the characteristic 5' cysteine rich domain(s), spacer region, and 3' cysteine rich domain structure.
Conclusion: The family of highly conserved TSG proteins, together with the IGFBP and CCN families, constitute an emerging multigene superfamily of secreted cysteine rich factors. The TSG branch of the superfamily appears to pre-date the others because it is present in all species examined, whereas the CCN and IGFBP genes are found only in vertebrates.