Today the retroviral integration reaction is probably understood, both in terms of its genetics and chemistry, in as much detail as any eukaryotic recombination process. That understanding is in part due to its high efficiency (for it can be induced to occur synchronously in every cell of a culture); to its simplicity (for there is only one major protein player); to its accessibility (for the viral genome has provided all the cis- and trans-acting players); and to its willingness to perform well in vitro, ultimately with purified components. The process has thus made the classic transition from a phenomenon to be studied genetically to a reaction that can also be studied biochemically. The next advances in our understanding of the process of retroviral integration are likely to center on chemical issues. Some basic enzymological issues need to be addressed: we need to determine the oligomeric state of the native IN protein; its state when bound to linear viral DNA; the residues at the active site; the residues involved in sequence-specific recognition of DNA; and the points of contact between IN monomers. Much of this information will follow from detailed mutagenesis of expressed IN genes. A crucial step will be the determination of the structure of the IN protein at atomic resolution through X-ray diffraction analysis of protein crystals, a project underway in several laboratories. That structure may immediately suggest how the enzyme contacts and joins two DNA molecules, and will enormously facilitate the design and interpretation of mutational studies. It seems plausible that we can understand the IN protein as a machine as well as any nuclease or recombinase. A significant number of larger biological questions about integration remain unanswered and will require genetic approaches. What is the true structure of the preintegration complex in the cytoplasm? How does the complex enter the nucleus, and obtain access to the host DNA? Why, at least for most viruses in most cells, does integration depend on cell division? Why does efficient expression of the viral DNA to form progeny viral RNA and proteins depend on integration? How are target sites for integration on the host genome selected, and why are there "hot spots" for insertion? Are there host proteins that facilitate or participate in the integration reaction itself, and what are those proteins? Are any of those proteins involved in site selection?(ABSTRACT TRUNCATED AT 400 WORDS)