Meiotic recombination does not occur randomly across the genome, but instead occurs at relatively high frequencies in some genomic regions (hotspots) and relatively low frequencies in others (coldspots). Hotspots and coldspots would shed light on the mechanism of recombination, but the accurate prediction of hot/cold spots is still an open question. In this study, we presented a model to predict hot/cold spots in yeast using increment of diversity combined with quadratic discriminant analysis (IDQD) based on sequence k-mer frequencies. 5-fold cross validation showed a total prediction accuracy of 80.3%. Compared with other machine-learning algorithms, IDQD approach is as powerful as random forest (RF) and outperforms support vector machine (SVM) in identifying hotspots and coldspots. We also predicted increased recombination rates in the upstream regions of transcription start sites and in the downstream regions of transcription termination sites. Additionally, genome-wide recombination map in yeast obtained by IDQD model is in close agreement with the experimentally generated map, especially for the Peak locations, although some fine-scale differences exist. Our results highlight the sequence dependency of recombination.
Copyright © 2011 Elsevier Ltd. All rights reserved.