LS-GKM: a new gkm-SVM for large-scale datasets

Bioinformatics. 2016 Jul 15;32(14):2196-8. doi: 10.1093/bioinformatics/btw142. Epub 2016 Mar 15.

Abstract

gkm-SVM is a sequence-based method for predicting and detecting the regulatory vocabulary encoded in functional DNA elements, and is a commonly used tool for studying gene regulatory mechanisms. Here we introduce new software, LS-GKM, which removes several limitations of our previous releases, enabling training on much larger scale (LS) datasets. LS-GKM also provides additional advanced gapped k-mer based kernel functions. With these improvements, LS-GKM achieves considerably higher accuracy than the original gkm-SVM.

Availability and implementation: C/C ++ source codes and related scripts are freely available from http://github.com/Dongwon-Lee/lsgkm/, and supported on Linux and Mac OS X.

Contact: [email protected]

Supplementary information: Supplementary data are available at Bioinformatics online.

MeSH terms

  • Chromatin Immunoprecipitation
  • DNA / genetics
  • Gene Regulatory Networks*
  • Humans
  • Software*
  • Support Vector Machine*

Substances

  • DNA