Background: PB1-F2 is a major virulence factor of influenza A. This protein is a product of an alternative reading frame in the PB1-encoding RNA segment 2. Its presence of is dictated by the presence or absence of premature stop codons. This virulence factor is present in every influenza pandemic and major epidemic of the 20th century. Absence of PB1-F2 is associated with mild disease, such as the 2009 H1N1 ("swine flu").
Results: The analysis of 8608 segment 2 sequences showed that only 8.5% have been annotated for the presence of PB1-F2. Our analysis indicates that 75% of segment 2 sequences are likely to encode PB1-F2. Two major populations of PB1-F2 are of lengths 90 and 57 while minor populations include lengths 52, 63, 79, 81, 87, and 101. Additional possible populations include the lengths of 59, 69, 81, 95, and 106. Previously described sequences include only lengths 57, 87, and 90. We observed substantial variation in PB1-F2 sequences where certain variants show up to 35% difference to well-defined reference sequences. Therefore this dataset indicates that there are many more variants that need to be functionally characterized.
Conclusions: Our web-accessible tool PB1-F2 Finder enables scanning of influenza sequences for potential PB1-F2 protein products. It provides an initial screen and annotation of PB1-F2 products. It is accessible at http://cvc.dfci.harvard.edu/pb1-f2.