Multiple Machine Learning Comparisons of HIV Cell-based and Reverse Transcriptase Data Sets

Mol Pharm. 2019 Apr 1;16(4):1620-1632. doi: 10.1021/acs.molpharmaceut.8b01297. Epub 2019 Feb 26.

Abstract

The human immunodeficiency virus (HIV) causes over a million deaths every year and has a huge economic impact in many countries. The first class of drugs approved were nucleoside reverse transcriptase inhibitors. A newer generation of reverse transcriptase inhibitors have become susceptible to drug resistant strains of HIV, and hence, alternatives are urgently needed. We have recently pioneered the use of Bayesian machine learning to generate models with public data to identify new compounds for testing against different disease targets. The current study has used the NIAID ChemDB HIV, Opportunistic Infection and Tuberculosis Therapeutics Database for machine learning studies. We curated and cleaned data from HIV-1 wild-type cell-based and reverse transcriptase (RT) DNA polymerase inhibition assays. Compounds from this database with ≤1 μM HIV-1 RT DNA polymerase activity inhibition and cell-based HIV-1 inhibition are correlated (Pearson r = 0.44, n = 1137, p < 0.0001). Models were trained using multiple machine learning approaches (Bernoulli Naive Bayes, AdaBoost Decision Tree, Random Forest, support vector classification, k-Nearest Neighbors, and deep neural networks as well as consensus approaches) and then their predictive abilities were compared. Our comparison of different machine learning methods demonstrated that support vector classification, deep learning, and a consensus were generally comparable and not significantly different from each other using 5-fold cross validation and using 24 training and test set combinations. This study demonstrates findings in line with our previous studies for various targets that training and testing with multiple data sets does not demonstrate a significant difference between support vector machine and deep neural networks.

Keywords: HIV; Naïve Bayes; assay central; deep learning; drug discovery; machine learning; reverse transcriptase; support vector machine.

Publication types

  • Comparative Study
  • Research Support, N.I.H., Extramural

MeSH terms

  • Anti-HIV Agents / pharmacology*
  • Bayes Theorem
  • Databases, Factual
  • Decision Trees
  • Drug Discovery
  • HIV / drug effects*
  • HIV Infections / drug therapy*
  • HIV Infections / virology
  • HIV Reverse Transcriptase / antagonists & inhibitors*
  • Humans
  • Machine Learning*
  • Neural Networks, Computer
  • Reverse Transcriptase Inhibitors / pharmacology*
  • Support Vector Machine

Substances

  • Anti-HIV Agents
  • Reverse Transcriptase Inhibitors
  • reverse transcriptase, Human immunodeficiency virus 1
  • HIV Reverse Transcriptase