Search | arXiv e-print repository

Viash: from scripts to pipelines

Authors: Robrecht Cannoodt, Hendrik Cannoodt, Eric Van de Kerckhove, Andy Boschmans, Dries De Maeyer, Toni Verbeiren

Abstract: Most bioinformatics pipelines consist of software components that are tightly coupled to the logic of the pipeline itself. This limits reusability of the individual components in the pipeline or introduces maintenance overhead when they need to be reimplemented in multiple pipelines. We introduce Viash, a tool for speeding up development of robust pipelines through "code-first" prototyping, separa… ▽ More Most bioinformatics pipelines consist of software components that are tightly coupled to the logic of the pipeline itself. This limits reusability of the individual components in the pipeline or introduces maintenance overhead when they need to be reimplemented in multiple pipelines. We introduce Viash, a tool for speeding up development of robust pipelines through "code-first" prototyping, separation of concerns and code generation of modular pipeline components. By decoupling the component functionality from the pipeline logic, component functionality becomes fully pipeline-agnostic, and conversely the resulting pipelines are agnostic towards specific component requirements. This separation of concerns improves reusability of components and facilitates multidisciplinar and pan-organisational collaborations. It has been applied in a variety of projects, from proof-of-concept pipelines to supporting an international data science competition. Viash is available as an open-source project at https://github.com/viash-io/viash and documentation is available at https://viash.io. △ Less

Submitted 21 October, 2021; originally announced October 2021.

Comments: 6 pages, 3 figures

arXiv:2001.10641 [pdf, other]

doi 10.32614/RJ-2020-007

The Rockerverse: Packages and Applications for Containerization with R

Authors: Daniel Nüst, Dirk Eddelbuettel, Dom Bennett, Robrecht Cannoodt, Dav Clark, Gergely Daroczi, Mark Edmondson, Colin Fay, Ellis Hughes, Lars Kjeldgaard, Sean Lopp, Ben Marwick, Heather Nolis, Jacqueline Nolis, Hong Ooi, Karthik Ram, Noam Ross, Lori Shepherd, Péter Sólymos, Tyson Lee Swetnam, Nitesh Turaga, Charlotte Van Petegem, Jason Williams, Craig Willis, Nan Xiao

Abstract: The Rocker Project provides widely used Docker images for R across different application scenarios. This article surveys downstream projects that build upon the Rocker Project images and presents the current state of R packages for managing Docker images and controlling containers. These use cases cover diverse topics such as package development, reproducible research, collaborative work, cloud-ba… ▽ More The Rocker Project provides widely used Docker images for R across different application scenarios. This article surveys downstream projects that build upon the Rocker Project images and presents the current state of R packages for managing Docker images and controlling containers. These use cases cover diverse topics such as package development, reproducible research, collaborative work, cloud-based data processing, and production deployment of services. The variety of applications demonstrates the power of the Rocker Project specifically and containerisation in general. Across the diverse ways to use containers, we identified common themes: reproducible environments, scalability and efficiency, and portability across clouds. We conclude that the current growth and diversification of use cases is likely to continue its positive impact, but see the need for consolidating the Rockerverse ecosystem of packages, developing common practices for applications, and exploring alternative containerisation software. △ Less

Submitted 17 August, 2020; v1 submitted 28 January, 2020; originally announced January 2020.

Comments: Source code for article available at https://github.com/nuest/rockerverse-paper/ Updated version includes some new paragraphs and corrections throughout the text; full diff available at https://github.com/nuest/rockerverse-paper/compare/preprint.v2...preprint.v3

MSC Class: 68N01 ACM Class: D.2.6; D.2.7; K.6.3

Journal ref: The R Journal (2020), 12:1, pages 437-461

arXiv:1812.00661 [pdf]

Essential guidelines for computational method benchmarking

Authors: Lukas M. Weber, Wouter Saelens, Robrecht Cannoodt, Charlotte Soneson, Alexander Hapfelmeier, Paul P. Gardner, Anne-Laure Boulesteix, Yvan Saeys, Mark D. Robinson

Abstract: In computational biology and other sciences, researchers are frequently faced with a choice between several computational methods for performing data analyses. Benchmarking studies aim to rigorously compare the performance of different methods using well-characterized benchmark datasets, to determine the strengths of each method or to provide recommendations regarding suitable choices of methods f… ▽ More In computational biology and other sciences, researchers are frequently faced with a choice between several computational methods for performing data analyses. Benchmarking studies aim to rigorously compare the performance of different methods using well-characterized benchmark datasets, to determine the strengths of each method or to provide recommendations regarding suitable choices of methods for an analysis. However, benchmarking studies must be carefully designed and implemented to provide accurate, unbiased, and informative results. Here, we summarize key practical guidelines and recommendations for performing high-quality benchmarking analyses, based on our experiences in computational biology. △ Less

Submitted 3 June, 2019; v1 submitted 3 December, 2018; originally announced December 2018.

Comments: Minor updates

Showing 1–3 of 3 results for author: Cannoodt, R