In 2008, the genome assembly and gene models for the domestic silkworm, Bombyx mori, were published by a Japanese and Chinese collaboration group. However, the genome assembly contains a non-negligible number of misassembled and gap regions due to the presence of many repetitive sequences within the silkworm genome. The erroneous genome assembly occasionally causes incorrect gene prediction. Here we performed hybrid assembly based on 140 × deep sequencing of long (PacBio) and short (Illumina) reads. The remaining gaps in the initial genome assembly were closed using BAC and Fosmid sequences, giving a new total length of 460.3 Mb, with 30 gap regions and an N50 comprising 16.8 Mb in scaffolds and 12.2 Mb in contigs. More RNA-seq and piRNA-seq reads were mapped on the new genome assembly compared with the previous version, indicating that the new genome assembly covers more transcribed regions, including repetitive elements. We performed gene prediction based on the new genome assembly using available mRNA and protein sequence data. The number of gene models was 16,880 with an N50 of 2154 bp. The new gene models reflected more accurate coding sequences and gene sets than old ones. The proportion of repetitive elements was also reestimated using the new genome assembly, and was calculated to be 46.8% in the silkworm genome. The new genome assembly and gene models are provided in SilkBase (http://silkbase.ab.a.u-tokyo.ac.jp).
Keywords: Gene prediction; Genome assembly; Long-read sequencing; Silkworm (Bombyx mori).
Copyright © 2019 The Authors. Published by Elsevier Ltd.. All rights reserved.