Respondent-driven sampling (RDS) is a sampling scheme used in socially connected human populations lacking a sampling frame. One of the first steps to make design-based inferences from RDS data is to estimate the sampling probabilities. A classical approach for such estimation assumes that a first-order Markov chain over a fully connected and undirected network may adequately represent RDS. This convenient model, however, does not reflect that the network may be directed and homophilous. The methods proposed in this work aim to address this issue. The main methodological contributions of this manuscript are two fold: first, we introduce a partially directed and homophilous network configuration model, and second, we develop two mathematical representations of the RDS sampling process over the proposed configuration model. Our simulation study shows that the resulting sampling probabilities are similar to those of RDS, and they improve the prevalence estimation under various realistic scenarios.
翻译:暂无翻译