Primer IDs (pIDs) are random oligonucleotide tags used in next-generation sequencing to identify sequences that originate from the same template. These tags are produced by degenerate primers during the reverse transcription of RNA molecules into cDNA. The use of pIDs helps to track the number of RNA molecules carried through amplification and sequencing, and allows resolution of inconsistencies between reads sharing a pID. Three potential issues complicate the above applications. First, multiple cDNAs may share a pID by chance; we found that while preventing any cDNAs from sharing a pID may be unfeasible, it is still practical to limit the number of these collisions. Secondly, a pID must be observed in at least three sequences to allow error correction; as such, pIDs observed only one or two times must be rejected. If the sequencing product contains copies from a high number of RT templates but produces few reads, our findings indicate that rejecting such pIDs will discard a great deal of data. Thirdly, the use of pIDs could influence amplification and sequencing. We examined the effects of several intrinsic and extrinsic factors on sequencing reads at both the individual and ensemble level.
© The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.