Gene-based molecular markers are increasingly used in crop breeding programs for marker-assisted selection. However, identification of genetic variants associated with important agronomic traits has remained a difficult task in soybean. RNA-Seq provides an efficient way, other than assessing global expression variations of coding genes, to discover gene-based SNPs at the whole genome level. In this study, RNA isolated from four soybean accessions each with three replications was subjected to high-throughput sequencing and a range of 44.2-65.9 million paired-end reads were generated for each library. A total of 75,209 SNPs were identified among different genotypes after combination of replications, 89.1% of which were located in expressed regions and 27.0% resulted in amino acid changes. GO enrichment analysis revealed that most significant enriched genes with nonsynonymous SNPs were involved in ribonucleotide binding or catalytic activity. Of 22 SNPs subjected to PCR amplification and Sanger sequencing, all of them were validated. To test the utility of identified SNPs, these validated SNPs were also assessed by genotyping a relative large population with 393 wild and cultivated soybean accessions. These SNPs identified by RNA-Seq provide a useful resource for genetic and genomic studies of soybean. Moreover, the collection of nonsynonymous SNPs annotated with their predicted functional effects also provides a valuable asset for further discovery of genes, identification of gene variants, and development of functional markers.
Keywords: Next-generation sequencing; Nonsynonymous SNPs; RNA-Seq; Single-nucleotide polymorphism; Soybean.