Ultrafast mapping of short reads to transcriptomic and metagenomic references via lightweight mapping techniques such as pseudoalignment has demonstrated success in substantially accelerating several types of analyses without much loss in accuracy compared to alignment-based approaches. The application of pseudoalignment to large reference sequences - like the genome - is, however, not trivial, due to the large size of the references or "targets" (i.e. chromosomes) and the presence of repetitive sequences within an individual reference sequence. This can lead to multiple matching locations for a -mer within a single reference, which in turn can lead to false positive mappings and incorrect reference assignments for a read when the colors across the -mer matches for a read are aggregated. Even when the read is determined to map to the appropriate reference, the increased occurrence of -mer multi-matches within a reference can prevent the determination of the correct approximate position of the read, which is often critical in applications that map short reads to the genome.
Keywords: genome mapping; pseudoalignment; single-cell ATAC-seq; virtual colors.