Algorithm audits have increased in recent years due to a growing need to independently assess the performance of automatically curated services that process, filter, and rank the large and dynamic amount of information available on the internet. Among several methodologies to perform such audits, virtual agents stand out because they offer the ability to perform systematic experiments, simulating human behaviour without the associated costs of recruiting participants. Motivated by the importance of research transparency and replicability of results, this paper focuses on the challenges of such an approach. It provides methodological details, recommendations, lessons learned, and limitations based on our experience of setting up experiments for eight search engines (including main, news, image and video sections) with hundreds of virtual agents placed in different regions. We demonstrate the successful performance of our research infrastructure across multiple data collections, with diverse experimental designs, and point to different changes and strategies that improve the quality of the method. We conclude that virtual agents are a promising venue for monitoring the performance of algorithms across long periods of time, and we hope that this paper can serve as a basis for further research in this area.
翻译:近年来,由于越来越需要独立评估自动包治服务的业绩,这种服务处理、过滤和排查互联网上现有的大量动态信息的数量,对自动包治服务的业绩进行独立评估,因此,分析审计有所增加。在进行这种审计的若干方法中,虚拟代理商表现突出,因为它们能够进行系统实验,模拟人的行为,而不需要相关征聘参与者的费用。由于研究透明度和结果可复制的重要性,本文件侧重于这种方法的挑战。根据我们为分布在不同区域的数以百计虚拟代理商建立8个搜索引擎(包括主要、新闻、图像和视频部分)实验的经验,它提供了方法细节、建议、经验教训和局限性。我们展示了我们研究基础设施在多种数据采集方面的成功业绩,包括各种实验设计,并指出了提高方法质量的不同变化和战略。我们的结论是,虚拟代理商是长期监测算法绩效的一个有希望的场所,我们希望这份文件能够作为这一领域进一步研究的基础。