Nanopore sequencing-based genome assembly and evolutionary genomics of circum-basmati rice

Jae Young Choi; Zoe N Lye; Simon C Groen; Xiaoguang Dai; Priyesh Rughani; Sophie Zaaijer; Eoghan D Harrington; Sissel Juul; Michael D Purugganan

doi:10.1186/s13059-020-1938-2

Nanopore sequencing-based genome assembly and evolutionary genomics of circum-basmati rice

Genome Biol. 2020 Feb 5;21(1):21. doi: 10.1186/s13059-020-1938-2.

Authors

Jae Young Choi¹, Zoe N Lye², Simon C Groen², Xiaoguang Dai³, Priyesh Rughani³, Sophie Zaaijer⁴, Eoghan D Harrington³, Sissel Juul³, Michael D Purugganan^{5

6}

Affiliations

¹ Center for Genomics and Systems Biology, Department of Biology, New York University, New York, NY, USA. [email protected].
² Center for Genomics and Systems Biology, Department of Biology, New York University, New York, NY, USA.
³ Oxford Nanopore Technologies, New York, NY, USA.
⁴ New York Genome Center, New York, NY, USA.
⁵ Center for Genomics and Systems Biology, Department of Biology, New York University, New York, NY, USA. [email protected].
⁶ Center for Genomics and Systems Biology, NYU Abu Dhabi Research Institute, New York University Abu Dhabi, Abu Dhabi, United Arab Emirates. [email protected].

Abstract

Background: The circum-basmati group of cultivated Asian rice (Oryza sativa) contains many iconic varieties and is widespread in the Indian subcontinent. Despite its economic and cultural importance, a high-quality reference genome is currently lacking, and the group's evolutionary history is not fully resolved. To address these gaps, we use long-read nanopore sequencing and assemble the genomes of two circum-basmati rice varieties.

Results: We generate two high-quality, chromosome-level reference genomes that represent the 12 chromosomes of Oryza. The assemblies show a contig N50 of 6.32 Mb and 10.53 Mb for Basmati 334 and Dom Sufid, respectively. Using our highly contiguous assemblies, we characterize structural variations segregating across circum-basmati genomes. We discover repeat expansions not observed in japonica-the rice group most closely related to circum-basmati-as well as the presence and absence variants of over 20 Mb, one of which is a circum-basmati-specific deletion of a gene regulating awn length. We further detect strong evidence of admixture between the circum-basmati and circum-aus groups. This gene flow has its greatest effect on chromosome 10, causing both structural variation and single-nucleotide polymorphism to deviate from genome-wide history. Lastly, population genomic analysis of 78 circum-basmati varieties shows three major geographically structured genetic groups: Bhutan/Nepal, India/Bangladesh/Myanmar, and Iran/Pakistan.

Conclusion: The availability of high-quality reference genomes allows functional and evolutionary genomic analyses providing genome-wide evidence for gene flow between circum-aus and circum-basmati, describes the nature of circum-basmati structural variation, and reveals the presence/absence variation in this important and iconic rice variety group.

Keywords: Admixture; Aromatic rice group; Asian rice; Aus; Awnless; Basmati; Crop evolution; De novo genome assembly; Domestication; Indica; Japonica; Nanopore sequencing; Oryza sativa.

Publication types

Research Support, Non-U.S. Gov't
Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

Chromosomes, Plant / genetics
Contig Mapping / methods
Evolution, Molecular
Genome, Plant
Nanopore Sequencing / methods*
Oryza / classification
Oryza / genetics*
Phylogeny
Whole Genome Sequencing / methods*