Structural variations (SVs) in genomic DNA can have profound effects on the evolution of living organisms, on phenotypic variations and on disease processes. A critical step in discovering the full extent of structural variations is the development of tools to characterize these variations accurately in next generation sequencing data. Toward this goal, we developed a software pipeline named digit that implements a novel measure of mapping ambiguity to discover interchromosomal SVs from mate-pair and pair-end sequencing data. The workflow robustly handles the high numbers of artifacts present in mate-pair sequencing and reduces the false positive rate while maintaining sensitivity. In the simulated data set, our workflow recovered 96% of simulated SVs. It generates a self-updating library of common translocations and allows for the investigation of patient- or group-specific events, making it suitable for discovering and cataloging chromosomal translocations associated with specific groups, traits, diseases or population structures.
© The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.