Crowdsourcing systems aggregate decisions of many people to help users quickly identify high-quality options, such as the best answers to questions or interesting news stories. A long-standing issue in crowdsourcing is how option quality and human judgement heuristics interact to affect collective outcomes, such as the perceived popularity of options. We address this limitation by conducting a controlled experiment where subjects choose between two ranked options whose quality can be independently varied. We use this data to construct a model that quantifies how judgement heuristics and option quality combine when deciding between two options. The model reveals popularity-ranking can be unstable: unless the quality difference between the two options is sufficiently high, the higher quality option is not guaranteed to be eventually ranked on top. To rectify this instability, we create an algorithm that accounts for judgement heuristics to infer the best option and rank it first. This algorithm is guaranteed to be optimal if data matches the model. When the data does not match the model, however, simulations show that in practice this algorithm performs better or at least as well as popularity-based and recency-based ranking for any two-choice question. Our work suggests that algorithms relying on inference of mathematical models of user behavior can substantially improve outcomes in crowdsourcing systems.
翻译:众包系统汇集了许多人的决定,以帮助用户快速识别高质量选项,例如问题的最佳答案或有趣的新闻报道。 众包中长期存在的问题是:选项质量和人类判断力的超常性是如何相互作用影响集体结果的,例如人们所认为的选项的受欢迎程度。 我们通过进行控制实验,让对象在质量可以独立不同的两个排名选项之间作出选择,来应对这一限制。 我们使用这些数据来构建一个模型,在决定两种选项时量化判断超常性和选项质量如何相结合。 模型显示受欢迎级别可能不稳定: 除非两种选项之间的质量差异足够高, 高质量选项不会最终被排在顶端。 为了纠正这一不稳定性, 我们创建了一种算法, 用于计算判断超常性判断力, 以推导出最佳选项, 并排在第一位。 如果数据与模型不匹配, 这种算法保证是最佳的。 但是, 模拟表明, 在实践中,这种算法表现更好, 或至少以受欢迎度为基础, 和基于常识性排序, 可以保证最终排在任何两选题上。 我们的工作表明, 算法可以大幅度地改进用户的数学模式。