Community occupancy models estimate species-specific parameters while sharing information across species by treating parameters as sampled from a common distribution. When communities consist of discrete groups, shrinkage of estimates toward the community mean can mask differences among groups. Infinite-mixture models using a Dirichlet process (DP) distribution, in which the number of latent groups is estimated from the data, have been proposed as a solution. In addition to community structure, these models estimate species similarity, which allows testing hypotheses about whether traits drive species response to environmental conditions. We develop a community occupancy model (COM) using a DP distribution to model species-level parameters. Because clustering algorithms are sensitive to dimensionality and distinctiveness of clusters, we conducted a simulation study to explore performance of the DP-COM with different dimensions (i.e., different numbers of model parameters with species-level DP random effects) and under varying cluster differences. Because the DP-COM is computationally expensive, we compared its estimates to a COM with a normal random species effect. We further applied the DP-COM model to a bird data set from Uganda. Estimates of the number of clusters and species cluster identity improved with increasing difference among clusters and increasing dimensions of the DP; but the number of clusters was always overestimated. Estimates of number of sites occupied and species and community-level covariate coefficients on occupancy probability were generally unbiased with (near-) nominal 95% Bayesian Credible Interval coverage. Accuracy of estimates from the normal and the DP-COM was similar. The DP-COM clustered 166 bird species into 27 clusters regarding their affiliation with open or woodland habitat and distance to oil wells. Estimates of covariate coefficients were similar between a normal and the DP-COM. Except sunbirds, species within a family were not more similar in their response to these covariates than the overall community. Given that estimates were consistent between the normal and the DP-COM, and considering the computational burden for the DP models, we recommend using the DP-COM only when the analysis focuses on community structure and species similarity, as these quantities can only be obtained under the DP-COM.
Keywords: Dirichlet process; bird point-counts; clustering; community occupancy model; dimensionality; infinite-mixture models; latent groups.
© 2020 by the Ecological Society of America.