Towards agile large-scale predictive modelling in drug discovery with flow-based programming design principles

J Cheminform. 2016 Nov 24:8:67. doi: 10.1186/s13321-016-0179-6. eCollection 2016.

Abstract

Predictive modelling in drug discovery is challenging to automate as it often contains multiple analysis steps and might involve cross-validation and parameter tuning that create complex dependencies between tasks. With large-scale data or when using computationally demanding modelling methods, e-infrastructures such as high-performance or cloud computing are required, adding to the existing challenges of fault-tolerant automation. Workflow management systems can aid in many of these challenges, but the currently available systems are lacking in the functionality needed to enable agile and flexible predictive modelling. We here present an approach inspired by elements of the flow-based programming paradigm, implemented as an extension of the Luigi system which we name SciLuigi. We also discuss the experiences from using the approach when modelling a large set of biochemical interactions using a shared computer cluster.Graphical abstract.

Keywords: Drug discovery; Flow-based programming; Machine learning; Predictive modelling; Workflows.