Nucleotide sequence of the murine leukaemia virus amphotropic strain 4070A integrase (IN) coding region and comparative structural analysis of the inferred polypeptide

Arch Virol. 1997;142(9):1757-70. doi: 10.1007/s007050050195.

Abstract

The complete nucleotide sequence of the integrase (IN) protein coding region of the murine leukaemia virus (MLV) amphotropic strain 4070A is presented. The sequence comprises 1,224 nucleotides, encoding a 408-residue polypeptide of M(r) 46,312. Alignment of the inferred 4070A IN amino acid sequence with the IN proteins of other MLV showed that substitutions are confined largely to segments within the N- and C-terminal domains. In the N-terminal domain the majority of substitutions occur as contiguous 2- to 6-residue blocks, whereas in the C-terminal domain they occur as isolated entities except within a short segment characterized by deletions/insertions. Selection appears to act on the C-terminal 19 residues of IN rather than on the N-terminal residues of ENV (encoded by overlapping reading frames), suggesting a functional role for this segment. Phylogenetic analyses grouped the sequences into two clusters, one comprising IN from the amphotropic strain 4070A and three ecotropic MLV (CAS-BR-E, Moloney and Friend), the other consisting of IN from three ecotropic MLV (two radiation-induced viruses and AKV) and a mink cell focus-forming (MCF) MLV virus. The same dichotomy and cluster composition was obtained from analysis of the long terminal repeat (LTR) regions from these viruses (consistent with the functional interrelationship of IN and LTR) but not from analysis of envelope protein sequences (consistent with the functional independence of ENV proteins from both IN and LTR). Secondary structure predictions supported features determined from the catalytic domain of human immunodeficiency virus and avian sarcoma virus IN, and identified probable structures within the relatively long N- and C-terminal domains of MLV IN proteins.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acid Sequence
  • Base Sequence
  • Binding Sites
  • Conserved Sequence
  • Gene Products, env / chemistry
  • Genes, env
  • Genes, pol*
  • Integrases / chemistry
  • Integrases / genetics*
  • Leukemia Virus, Murine / enzymology
  • Leukemia Virus, Murine / genetics*
  • Leukemia Virus, Murine / physiology
  • Molecular Sequence Data
  • Mutation
  • Phylogeny
  • Protein Structure, Secondary
  • Repetitive Sequences, Nucleic Acid
  • Sequence Alignment

Substances

  • Gene Products, env
  • Integrases

Associated data

  • GENBANK/U87552