Efficient (linear time) algorithms are described for identifying global molecular sequence features allowing for errors including repeats, matches between sequences, dyad symmetry pairings, and other sequence patterns. A multiple sequence alignment algorithm is also described. Specific applications are given to hepatitis B viruses and the J5-C (J, joining; C, constant) region of the immunoglobulin kappa gene.