Algorithm audits have increased in recent years due to a growing need to independently assess the performance of automatically curated services that process, filter and rank the large and dynamic amount of information available on the internet. Among several methodologies to perform such audits, virtual agents stand out because they offer the possibility of performing systematic experiments simulating human behaviour without the associated costs of recruiting participants. Motivated by the importance of research transparency and replicability of results, this paper focuses on the challenges of such an approach, and it provides methodological details, recommendations, lessons learned and limitations that researchers should take into consideration when setting up experiments with virtual agents. We demonstrate the successful performance of our research infrastructure in multiple data collections with diverse experimental designs, and point to different changes and strategies that improved the quality of the method. We conclude that virtual agents are a promising venue for monitoring the performance of algorithms during longer periods of time, and we hope that this paper serves as a base to widen the research in this direction.
翻译:近年来,由于越来越需要独立评估自动包办服务的业绩,从而对互联网上大量动态信息进行处理、过滤和排位,因此,分析审计在最近几年有所增加。在进行这种审计的若干方法中,虚拟代理商表现突出,因为它们提供了进行系统实验,模拟人类行为的可能性,而不附带征聘参与者的费用。由于研究的透明度和结果的可复制性的重要性,本文件侧重于这种方法的挑战,并提供了方法细节、建议、经验教训和局限性,供研究人员在与虚拟代理商进行实验时加以考虑。我们展示了我们研究基础设施在多种数据采集方面的成功业绩,并用不同的实验设计,指出了改进方法质量的不同变化和战略。我们的结论是,虚拟代理商是监测长期算法绩效的一个有希望的场所,我们希望这份文件成为扩大这方面研究的基础。