Large-scale comparative studies of DNA fingerprints prefer automated chip capillary electrophoresis over conventional gel planar electrophoresis due to the higher precision of the digitalization process. However, the determination of band sizes is still limited by the device resolution and sizing accuracy. Band matching, therefore, remains the key step in DNA fingerprint analysis. Most current methods evaluate only the pairwise similarity of the samples, using heuristically determined constant thresholds to evaluate the maximum allowed band size deviation; unfortunately, that approach significantly reduces the ability to distinguish between closely related samples. This study presents a new approach based on global multiple alignments of bands of all samples, with an adaptive threshold derived from the detailed migration analysis of a large number of real samples. The proposed approach allows the accurate automated analysis of DNA fingerprint similarities for extensive epidemiological studies of bacterial strains, thereby helping to prevent the spread of dangerous microbial infections.
Keywords: Automated chip capillary electrophoresis; Band matching; DBSCAN, density-based spatial clustering of applications with noise; DNA fingerprinting; DTW, dynamic time warping; ESBL, extended spectrum beta-lactamases; Gel sample distortion; Genotyping; KLPN, Klebsiella pneumonia; MALDI-TOF, matrix assisted laser desorption ionization – time of flight; Pattern recognition; R-square, ratio of the sum of squares; RMSE, root mean squared error; SD, standard deviation; SLINK, single linkage; SSE, sum of squares due to error; UPGMA, unweighted pair group method with arithmetic mean; rep-PCR, repetitive element palindromic polymerase chain reaction.