Critical Assessment of Metagenome Interpretation-a benchmark of metagenomics software

Alexander Sczyrba; Peter Hofmann; Peter Belmann; David Koslicki; Stefan Janssen; Johannes Dröge; Ivan Gregor; Stephan Majda; Jessika Fiedler; Eik Dahms; Andreas Bremges; Adrian Fritz; Ruben Garrido-Oter; Tue Sparholt Jørgensen; Nicole Shapiro; Philip D Blood; Alexey Gurevich; Yang Bai; Dmitrij Turaev; Matthew Z DeMaere; Rayan Chikhi; Niranjan Nagarajan; Christopher Quince; Fernando Meyer; Monika Balvočiūtė; Lars Hestbjerg Hansen; Søren J Sørensen; Burton K H Chia; Bertrand Denis; Jeff L Froula; Zhong Wang; Robert Egan; Dongwan Don Kang; Jeffrey J Cook; Charles Deltel; Michael Beckstette; Claire Lemaitre; Pierre Peterlongo; Guillaume Rizk; Dominique Lavenier; Yu-Wei Wu; Steven W Singer; Chirag Jain; Marc Strous; Heiner Klingenberg; Peter Meinicke; Michael D Barton; Thomas Lingner; Hsin-Hung Lin; Yu-Chieh Liao; Genivaldo Gueiros Z Silva; Daniel A Cuevas; Robert A Edwards; Surya Saha; Vitor C Piro; Bernhard Y Renard; Mihai Pop; Hans-Peter Klenk; Markus Göker; Nikos C Kyrpides; Tanja Woyke; Julia A Vorholt; Paul Schulze-Lefert; Edward M Rubin; Aaron E Darling; Thomas Rattei; Alice C McHardy

doi:10.1038/nmeth.4458

Critical Assessment of Metagenome Interpretation-a benchmark of metagenomics software

Nat Methods. 2017 Nov;14(11):1063-1071. doi: 10.1038/nmeth.4458. Epub 2017 Oct 2.

Authors

Alexander Sczyrba^{1

2}, Peter Hofmann^{3

4

5}, Peter Belmann^{1

2

4

5}, David Koslicki⁶, Stefan Janssen^{4

7

8}, Johannes Dröge^{3

4

5}, Ivan Gregor^{3

4

5}, Stephan Majda³, Jessika Fiedler^{3

4}, Eik Dahms^{3

4

5}, Andreas Bremges^{1

2

4

5

9}, Adrian Fritz^{4

5}, Ruben Garrido-Oter^{3

4

5

10

11}, Tue Sparholt Jørgensen^{12

13

14}, Nicole Shapiro¹⁵, Philip D Blood¹⁶, Alexey Gurevich¹⁷, Yang Bai¹⁰, Dmitrij Turaev¹⁸, Matthew Z DeMaere¹⁹, Rayan Chikhi^{20

21}, Niranjan Nagarajan²², Christopher Quince²³, Fernando Meyer^{4

5}, Monika Balvočiūtė²⁴, Lars Hestbjerg Hansen¹², Søren J Sørensen¹³, Burton K H Chia²², Bertrand Denis²², Jeff L Froula¹⁵, Zhong Wang¹⁵, Robert Egan¹⁵, Dongwan Don Kang¹⁵, Jeffrey J Cook²⁵, Charles Deltel^{26

27}, Michael Beckstette²⁸, Claire Lemaitre^{26

27}, Pierre Peterlongo^{26

27}, Guillaume Rizk^{27

29}, Dominique Lavenier^{21

27}, Yu-Wei Wu^{30

31}, Steven W Singer^{30

32}, Chirag Jain³³, Marc Strous³⁴, Heiner Klingenberg³⁵, Peter Meinicke³⁵, Michael D Barton¹⁵, Thomas Lingner³⁶, Hsin-Hung Lin³⁷, Yu-Chieh Liao³⁷, Genivaldo Gueiros Z Silva³⁸, Daniel A Cuevas³⁸, Robert A Edwards³⁸, Surya Saha³⁹, Vitor C Piro^{40

41}, Bernhard Y Renard⁴⁰, Mihai Pop^{42

43}, Hans-Peter Klenk⁴⁴, Markus Göker⁴⁵, Nikos C Kyrpides¹⁵, Tanja Woyke¹⁵, Julia A Vorholt⁴⁶, Paul Schulze-Lefert^{10

11}, Edward M Rubin¹⁵, Aaron E Darling¹⁹, Thomas Rattei¹⁸, Alice C McHardy^{3

4

5

11}

Affiliations

¹ Faculty of Technology, Bielefeld University, Bielefeld, Germany.
² Center for Biotechnology, Bielefeld University, Bielefeld, Germany.
³ Formerly Department of Algorithmic Bioinformatics, Heinrich Heine University (HHU), Duesseldorf, Germany.
⁴ Department of Computational Biology of Infection Research, Helmholtz Centre for Infection Research (HZI), Braunschweig, Germany.
⁵ Braunschweig Integrated Centre of Systems Biology (BRICS), Braunschweig, Germany.
⁶ Mathematics Department, Oregon State University, Corvallis, Oregon, USA.
⁷ Department of Pediatrics, University of California, San Diego, California, USA.
⁸ Department of Computer Science and Engineering, University of California, San Diego, California, USA.
⁹ German Center for Infection Research (DZIF), partner site Hannover-Braunschweig, Braunschweig, Germany.
¹⁰ Department of Plant Microbe Interactions, Max Planck Institute for Plant Breeding Research, Cologne, Germany.
¹¹ Cluster of Excellence on Plant Sciences (CEPLAS).
¹² Department of Environmental Science, Section of Environmental microbiology and Biotechnology, Aarhus University, Roskilde, Denmark.
¹³ Department of Microbiology, University of Copenhagen, Copenhagen, Denmark.
¹⁴ Department of Science and Environment, Roskilde University, Roskilde, Denmark.
¹⁵ Department of Energy, Joint Genome Institute, Walnut Creek, California, USA.
¹⁶ Pittsburgh Supercomputing Center, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA.
¹⁷ Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University, Saint Petersburg, Russia.
¹⁸ Department of Microbiology and Ecosystem Science, University of Vienna, Vienna, Austria.
¹⁹ The ithree institute, University of Technology Sydney, Sydney, New South Wales, Australia.
²⁰ Department of Computer Science, Research Center in Computer Science (CRIStAL), Signal and Automatic Control of Lille, Lille, France.
²¹ National Centre of the Scientific Research (CNRS), Rennes, France.
²² Department of Computational and Systems Biology, Genome Institute of Singapore, Singapore.
²³ Department of Microbiology and Infection, Warwick Medical School, University of Warwick, Coventry, UK.
²⁴ Department of Computer Science, University of Tuebingen, Tuebingen, Germany.
²⁵ Intel Corporation, Hillsboro, Oregon, USA.
²⁶ GenScale-Bioinformatics Research Team, Inria Rennes-Bretagne Atlantique Research Centre, Rennes, France.
²⁷ Institute of Research in Informatics and Random Systems (IRISA), Rennes, France.
²⁸ Department of Molecular Infection Biology, Helmholtz Centre for Infection Research, Braunschweig, Germany.
²⁹ Algorizk-IT consulting and software systems, Paris, France.
³⁰ Joint BioEnergy Institute, Emeryville, California, USA.
³¹ Graduate Institute of Biomedical Informatics, College of Medical Science and Technology, Taipei Medical University, Taipei, Taiwan.
³² Biological Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, California, USA.
³³ School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, Georgia, USA.
³⁴ Energy Engineering and Geomicrobiology, University of Calgary, Calgary, Alberta, Canada.
³⁵ Department of Bioinformatics, Institute for Microbiology and Genetics, University of Goettingen, Goettingen, Germany.
³⁶ Genevention GmbH, Goettingen, Germany.
³⁷ Institute of Population Health Sciences, National Health Research Institutes, Zhunan Town, Taiwan.
³⁸ Computational Science Research Center, San Diego State University, San Diego, California, USA.
³⁹ Boyce Thompson Institute for Plant Research, New York, New York, USA.
⁴⁰ Research Group Bioinformatics (NG4), Robert Koch Institute, Berlin, Germany.
⁴¹ Coordination for the Improvement of Higher Education Personnel (CAPES) Foundation, Ministry of Education of Brazil, Brasília, Brazil.
⁴² Center for Bioinformatics and Computational Biology, University of Maryland, College Park, Maryland, USA.
⁴³ Department of Computer Science, University of Maryland, College Park, Maryland, USA.
⁴⁴ School of Biology, Newcastle University, Newcastle upon Tyne, UK.
⁴⁵ Leibniz Institute DSMZ-German Collection of Microorganisms and Cell Cultures, Braunschweig, Germany.
⁴⁶ Institute of Microbiology, ETH Zurich, Zurich, Switzerland.

Abstract

Methods for assembly, taxonomic profiling and binning are key to interpreting metagenome data, but a lack of consensus about benchmarking complicates performance assessment. The Critical Assessment of Metagenome Interpretation (CAMI) challenge has engaged the global developer community to benchmark their programs on highly complex and realistic data sets, generated from ∼700 newly sequenced microorganisms and ∼600 novel viruses and plasmids and representing common experimental setups. Assembly and genome binning programs performed well for species represented by individual genomes but were substantially affected by the presence of related strains. Taxonomic profiling and binning programs were proficient at high taxonomic ranks, with a notable performance decrease below family level. Parameter settings markedly affected performance, underscoring their importance for program reproducibility. The CAMI results highlight current challenges but also provide a roadmap for software selection to answer specific research questions.

MeSH terms

Algorithms
Benchmarking
Metagenomics*
Sequence Analysis, DNA
Software*

Abstract

MeSH terms

Grants and funding