Compositional framework for multitask learning in the identification of cleavage sites of HIV-1 protease

J Biomed Inform. 2020 Feb:102:103376. doi: 10.1016/j.jbi.2020.103376. Epub 2020 Jan 11.

Abstract

Inadequate patient samples and costly annotated data generations result into the smaller dataset in the biomedical domain. Due to which the predictions with a trained model that usually reveal a single small dataset association are fail to derive robust insights. To cope with the data sparsity, a promising strategy of combining data from the different related tasks is exercised in various application. Motivated by, successful work in the various bioinformatics application, we propose a multitask learning model based on multi-kernel that exploits the dependencies among various related tasks. This work aims to combine the knowledge from experimental studies of the different dataset to build stronger predictive models for HIV-1 protease cleavage sites prediction. In this study, a set of peptide data from one source is referred as 'task' and to integrate interactions from multiple tasks; our method exploits the common features and parameters sharing across the data source. The proposed framework uses feature integration, feature selection, multi-kernel and multifactorial evolutionary algorithm to model multitask learning. The framework considered seven different feature descriptors and four different kernel variants of support vector machines to form the optimal multi-kernel learning model. To validate the effectiveness of the model, the performance parameters such as average accuracy, and area under curve have been evaluated on the suggested model. We also carried out Friedman and post hoc statistical test to substantiate the significant improvement achieved by the proposed framework. The result obtained following the extensive experiment confirms the belief that multitask learning in cleavage site identification can improve the performance.

Keywords: HIV-1 protease; Multifactorial evolution; Multiple Kernel learning; Multitask learning; Protein encoding.

MeSH terms

  • Algorithms*
  • Computational Biology*
  • HIV Protease* / chemistry
  • Learning

Substances

  • HIV Protease
  • p16 protease, Human immunodeficiency virus 1