16S rRNA amplicon sequences are predominantly used to identify the taxonomic composition of a microbiome, but they can also be used to generate simulated metagenomes to circumvent costly empirical shotgun sequencing. The effectiveness of using "simulated metagenomes" (shotgun metagenomes simulated from 16S rRNA amplicons using a database of full genomes closely related to the amplicons) in nonmodel systems is poorly known. We sought to determine the accuracy of simulated metagenomes in a nonmodel organism, the Canada goose (Branta canadensis), by comparing metagenomes and metatranscriptomes to simulated metagenomes derived from 16S amplicon sequencing. We found significant differences between the metagenomes, metatranscriptomes, and simulated metagenomes when comparing enzymes, KEGG orthologies (KO), and metabolic pathways. The simulated metagenomes accurately identified the majority (>70%) of the total enzymes, KOs, and pathways. The simulated metagenomes accurately identified the majority of the short-chain fatty acid metabolic pathways crucial to folivores. When narrowed in scope to specific genes of interest, the simulated metagenomes overestimated the number of antimicrobial resistance genes and underestimated the number of genes related to the breakdown of plant matter. Our results suggest that simulated metagenomes should not be used in lieu of empirical sequencing when studying the functional potential of a nonmodel organism's microbiome. Regarding the function of the Canada goose microbiome, we found unexpected amounts of fermentation pathways, and we found that a few taxa are responsible for large portions of the functional potential of the microbiome. IMPORTANCE The taxonomic composition of a microbiome is predominately identified using amplicon sequencing of 16S rRNA genes, but as a single marker, it cannot identify functions (genes). Metagenome and metatranscriptome sequencing can determine microbiome function but can be cost prohibitive. Therefore, computational methods have been developed to generate simulated metagenomes derived from 16S rRNA sequences and databases of full-length genomes. Simulated metagenomes can be an effective alternative to empirical sequencing, but accuracy depends on the genomic database used and whether the database contains organisms closely related to the 16S sequences. These tools are effective in well-studied systems, but the accuracy of these predictions in a nonmodel system is less known. Using a nonmodel bird species, we characterized the function of the microbiome and compared the accuracy of 16S-derived simulated metagenomes to sequenced metagenomes. We found that the simulated metagenomes reflect most but not all functions of empirical metagenome sequencing.
Keywords: Canada goose; avian; metagenome; metatranscriptome; microbiome.