Objective: To investigate the impact of different logistic regression estimators applied to RDS samples obtained by simulation and real data. Methods: Four simulated populations were created combining different connectivity models, levels of clusterization and infection processes. Each subject in the population received two attributes, only one of them related to the infection process. From each population, RDS samples with different sizes were obtained. Similarly, RDS samples were obtained from a real-world dataset. Three logistic regression estimators were applied to assess the association between the attributes and the infection status, and subsequently the observed coverage of each was measured. Results: The type of connectivity had more impact on estimators performance than the clusterization level. In simulated datasets, unweighted logistic regression estimators emerged as the best option, although all estimators showed a fairly good performance. In the real dataset, the performance of weighted estimators presented some instabilities, making them a risky option. Conclusion: An unweighted logistic regression estimator is a reliable option to be applied to RDS samples, with similar performance to random samples and, therefore, should be the preferred option.
翻译:目标:调查对模拟和真实数据获得的RDS样本应用不同的后勤回归估计值的影响; 方法:结合不同的连接模型、集束化和感染过程的不同程度,创建了4个模拟人口群; 人口中的每个主体都有两个属性,只有其中的一个与感染过程有关; 从每个人群中都获得了不同大小的RDS样本; 同样,从现实世界数据集中获取了RDS样本; 使用了3个后勤回归估计值来评估属性与感染状态之间的联系,随后对每个样本的观测范围进行了测量; 结果:连接类型对估计值的性能的影响大于集束化水平; 在模拟数据集中,未加权的后勤回归估计值作为最佳选项出现,尽管所有估计者都表现出相当良好的性能; 在真实数据集中,加权估量器的性能显示一些不稳定性,使其成为危险的选项; 结论:未加权的后勤回归估计值是适用于RDS样本的可靠选项,其性能与随机样本相似,因此,应选择为首选办法。