A molecular epidemiology study was conducted among more than 100 human immunodeficiency virus type 1 (HIV-1) subtype C seropositive intravenous drug users (IDUs) from China. Genotyping based on the envelope C2V3 coding region revealed the highest homology of the most prevalent virus strains circulating throughout China to subtype C sequences of Indian origin. Based on these results, a virtually full-length genome representing the most prevalent class of clade C strains circulating throughout China was directly amplified from peripheral blood mononuclear cells of a selected HIV-infected IDU and subcloned. Sequence analysis identified a mosaic structure, suggesting extensive intersubtype recombination events between genomes of the prevalent clade C and (B')-subtype Thai virus strains of that geographic region. Recombinant Identification Program analysis and phylogenetic bootstrapping suggested that there were 10 breakpoints (i) in the gag-pol coding region, (ii) in vpr and at the 3' end of the vpu gene, and (iii) in the nef open reading frame. (B')-sequences therefore include (i) several insertions in the gag-pol coding region; (ii) 3'-vpr, the complete vpu gene, and the first exons of tat and rev; and (iii) the 5' half of the nef gene. Breakpoints located in the vpr/vpu coding region as well as in the nef gene of 97cn54 were found at almost identical positions of all subtype C strains isolated from IDUs living in different areas of China, suggesting a common ancestor for the C/B' recombinant strains. More than 50% of well-defined subtype B-derived cytotoxic T-lymphocyte epitopes within Gag and Pol and 10% of the known epitopes in Env were found to exactly match sequences within in this clade C/B' chimeric reference strain. These results may substantially facilitate a biological comparison of clade C-derived reference strains as well as the generation of useful reagents supporting vaccine-related efforts in China.