Soil DNA profiling has potential as a forensic tool to establish a link between soil collected at a crime scene and soil recovered from a suspect. However, a quantitative measure is needed to investigate the spatial/temporal variability across multiple scales prior to their application in forensic science. In this study, soil DNA profiles across Miami-Dade, FL, were generated using length heterogeneity PCR to target four taxa. The objectives of this study were to (i) assess the biogeographical patterns of soils to determine whether soil biota is spatially correlated with geographic location and (ii) evaluate five machine learning algorithms for their predictive ability to recognize biotic patterns which could accurately classify soils at different spatial scales regardless of seasonal collection. Results demonstrate that soil communities have unique patterns and are spatially autocorrelated. Bioinformatic algorithms could accurately classify soils across all scales with Random Forest significantly outperforming all other algorithms regardless of spatial level.
Keywords: Random Forest; forensic science; machine learning algorithms; soil DNA profiling; soil provenance; spatial scale.
© 2018 American Academy of Forensic Sciences.