增强对新数据来源的信任：众包生态图片分类 (Increasing trust in new data sources: crowdsourcing image classification for ecology)

Crowdsourcing methods facilitate the production of scientific information by non-experts. This form of citizen science (CS) is becoming a key source of complementary data in many fields to inform data-driven decisions and study challenging problems. However, concerns about the validity of these data often constrain their utility. In this paper, we focus on the use of citizen science data in addressing complex challenges in environmental conservation. We consider this issue from three perspectives. First, we present a literature scan of papers that have employed Bayesian models with citizen science in ecology. Second, we compare several popular majority vote algorithms and introduce a Bayesian item response model that estimates and accounts for participants' abilities after adjusting for the difficulty of the images they have classified. The model also enables participants to be clustered into groups based on ability. Third, we apply the model in a case study involving the classification of corals from underwater images from the Great Barrier Reef, Australia. We show that the model achieved superior results in general and, for difficult tasks, a weighted consensus method that uses only groups of experts and experienced participants produced better performance measures. Moreover, we found that participants learn as they have more classification opportunities, which substantially increases their abilities over time. Overall, the paper demonstrates the feasibility of CS for answering complex and challenging ecological questions when these data are appropriately analysed. This serves as motivation for future work to increase the efficacy and trustworthiness of this emerging source of data.

翻译：众包方法可以通过非专家人员生产科学信息。这种公民科学（CS）形式正在许多领域成为补充数据的主要来源，以便于数据驱动的决策和研究挑战性问题。然而，对这些数据的有效性的担忧经常限制了它们的实用性。在本文中，我们关注公民科学数据在环境保护中解决复杂挑战方面的使用。我们从三个方面考虑了这个问题。首先，我们展示了一篇使用贝叶斯模型和公民科学在生态学中的论文的文献调查。其次，我们比较了几种流行的多数投票算法，并引入了一个贝叶斯项目反应模型，该模型在调整了参与者分类图片的难度后估计和考虑了参与者的能力。该模型还使参与者能够根据能力聚成组。第三，我们在一个案例研究中应用该模型，这涉及到对澳大利亚大堡礁的水下图像中珊瑚的分类。我们表明，模型在一般情况下取得了优秀的结果，并且对于困难的任务，仅使用专家组和经验丰富的参与者的加权共识方法产生了更好的绩效测量。此外，我们发现随着参与者进行更多的分类机会，他们会逐渐提高能力。总的来说，本文证明了CS可用于解答复杂和具有挑战性的生态问题，前提是对这些数据进行适当的分析。这为未来工作提高这一新兴数据源的效力和可信度提供了动机。