Kmasker plants - a tool for assessing complex sequence space in plant species

Sebastian Beier; Chris Ulpinnis; Markus Schwalbe; Thomas Münch; Robert Hoffie; Iris Koeppel; Christian Hertig; Nagaveni Budhagatapalli; Stefan Hiekel; Krishna M Pathi; Goetz Hensel; Martin Grosse; Sindy Chamas; Sophia Gerasimova; Jochen Kumlehn; Uwe Scholz; Thomas Schmutzer

doi:10.1111/tpj.14645

Kmasker plants - a tool for assessing complex sequence space in plant species

Plant J. 2020 May;102(3):631-642. doi: 10.1111/tpj.14645. Epub 2020 Jan 11.

Authors

Sebastian Beier^#¹, Chris Ulpinnis^#², Markus Schwalbe¹, Thomas Münch¹, Robert Hoffie¹, Iris Koeppel¹, Christian Hertig¹, Nagaveni Budhagatapalli¹, Stefan Hiekel¹, Krishna M Pathi¹, Goetz Hensel¹, Martin Grosse¹, Sindy Chamas¹, Sophia Gerasimova¹, Jochen Kumlehn¹, Uwe Scholz¹, Thomas Schmutzer³

Affiliations

¹ Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466, Seeland, Germany.
² Leibniz Institute of Plant Biochemistry, Bioinformatics and Scientific Data, 06120, Halle, Germany.
³ Department of Natural Sciences III, Institute for Agricultural and Nutritional Sciences, Martin Luther University Halle-Wittenberg, 06120, Halle, Germany.

^# Contributed equally.

PMID: 31823436
DOI: 10.1111/tpj.14645

Abstract

Many plant genomes display high levels of repetitive sequences. The assembly of these complex genomes using short high-throughput sequence reads is still a challenging task. Underestimation or disregard of repeat complexity in these datasets can easily misguide downstream analysis. Detection of repetitive regions by k-mer counting methods has proved to be reliable. Easy-to-use applications utilizing k-mer counting are in high demand, especially in the domain of plants. We present Kmasker plants, a tool that uses k-mer count information as an assistant throughout the analytical workflow of genome data that is provided as a command-line and web-based solution. Beside its core competence to screen and mask repetitive sequences, we have integrated features that enable comparative studies between different cultivars or closely related species and methods that estimate target specificity of guide RNAs for application of site-directed mutagenesis using Cas9 endonuclease. In addition, we have set up a web service for Kmasker plants that maintains pre-computed indices for 10 of the economically most important cultivated plants. Source code for Kmasker plants has been made publically available at https://github.com/tschmutzer/kmasker. The web service is accessible at https://kmasker.ipk-gatersleben.de.

Keywords: CRISPR; comparative genomics; genome editing; k-mer analysis; plant genomics; repeat masking; sequence analysis; technical advance.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms
Gene Editing
Genome, Plant / genetics*
Genomics
RNA, Guide, CRISPR-Cas Systems / genetics
Sequence Analysis, DNA
Software

Substances

RNA, Guide, CRISPR-Cas Systems