Respondent-driven sampling (RDS) is a widely used method for sampling from hard-to-reach human populations, especially populations at higher risk for HIV. Data are collected through peer-referral over social networks. RDS has proven practical for data collection in many difficult settings and is widely used. Inference from RDS data requires many strong assumptions because the sampling design is partially beyond the control of the researcher and partially unobserved. We introduce diagnostic tools for most of these assumptions and apply them in 12 high risk populations. These diagnostics empower researchers to better understand their data and encourage future statistical research on RDS.
Keywords: HIV/AIDS; diagnostics; exploratory data analysis; hard-to-reach populations; link-tracing sampling; non-ignorable design; respondent-driven sampling; social networks; survey sampling.