Gene-based SNP identification and validation in soybean using next-generation transcriptome sequencing

Mol Genet Genomics. 2018 Jun;293(3):623-633. doi: 10.1007/s00438-017-1410-5. Epub 2017 Dec 27.

Abstract

Gene-based molecular markers are increasingly used in crop breeding programs for marker-assisted selection. However, identification of genetic variants associated with important agronomic traits has remained a difficult task in soybean. RNA-Seq provides an efficient way, other than assessing global expression variations of coding genes, to discover gene-based SNPs at the whole genome level. In this study, RNA isolated from four soybean accessions each with three replications was subjected to high-throughput sequencing and a range of 44.2-65.9 million paired-end reads were generated for each library. A total of 75,209 SNPs were identified among different genotypes after combination of replications, 89.1% of which were located in expressed regions and 27.0% resulted in amino acid changes. GO enrichment analysis revealed that most significant enriched genes with nonsynonymous SNPs were involved in ribonucleotide binding or catalytic activity. Of 22 SNPs subjected to PCR amplification and Sanger sequencing, all of them were validated. To test the utility of identified SNPs, these validated SNPs were also assessed by genotyping a relative large population with 393 wild and cultivated soybean accessions. These SNPs identified by RNA-Seq provide a useful resource for genetic and genomic studies of soybean. Moreover, the collection of nonsynonymous SNPs annotated with their predicted functional effects also provides a valuable asset for further discovery of genes, identification of gene variants, and development of functional markers.

Keywords: Next-generation sequencing; Nonsynonymous SNPs; RNA-Seq; Single-nucleotide polymorphism; Soybean.

MeSH terms

  • Gene Expression Profiling / methods*
  • Genotype
  • Glycine max / classification
  • Glycine max / genetics*
  • High-Throughput Nucleotide Sequencing / methods*
  • Plant Proteins / genetics
  • Polymorphism, Single Nucleotide*
  • Sequence Analysis, RNA / methods

Substances

  • Plant Proteins