Soil bacterial communities have long been recognized as important ecosystem components, and have been the focus of many local and regional studies. However, there is a lack of data at large spatial scales, on the biodiversity of soil microorganisms; national or more extensive studies to date have typically consisted of low replication of haphazardly collected samples. This has led to large spatial gaps in soil microbial biodiversity data. Using a pre-existing dataset of bacterial community composition across a 16-km regular sampling grid in France, we show that the number of detected OTUs changes little under different sampling designs (grid, random, or representative), but increases with the number of samples collected. All common OTUs present in the full dataset were detected when analyzing just 4% of the samples, yet the number of rare OTUs increased exponentially with sampling effort. We show that far more intensive sampling, across all global biomes, is required to detect the biodiversity of soil microorganisms. We propose avenues such as citizen science to ensure these large sample datasets can be more realistically achieved. Furthermore, we argue that taking advantage of pre-existing resources and programs, utilizing current technologies efficiently and considering the potential of future technologies will ensure better outcomes from large and extensive sample surveys. Overall, decreasing the spatial gaps in global soil microbial diversity data will increase our understanding on what governs the distribution of soil taxa, and how these distributions, and therefore their ecosystem contributions, will continue to change into the future.
Keywords: biodiversity; biogeography; global datasets; national datasets; soil bacteria.