在本教程中,我们将介绍通过公共众包市场进行的数据标记,并介绍一些有效收集标记数据的关键技术,包括聚合、增量重标记和动态定价。
接下来是一个练习环节,参与者选择一个真实的标签收集任务,实验选择标签过程的设置,并在最大的众包市场之一上启动自己的标签收集项目。在教程期间,所有项目都在真正的Toloka人群上运行。当我们在等待群体表演者对参与者的项目进行注释时,我们提出了在高效聚合、增量重标签和动态定价方面的主要理论结果。我们还讨论了众包的优势和劣势,以及对现实任务的适用性,总结了我们5年来在众包方面的研究和行业专业知识。所有参与者都会收到关于他们项目的反馈和实用建议。
讲者:
目录内容:
引言 Part 0: Introduction— The concept of crowdsourcing— Crowdsourcing task examples— Crowdsourcing platforms — Yandex crowdsourcing experience
众包数据收集 Part I: Main components of data collection via crowdsourcing — Decomposition for an effective pipeline — Task instruction & interface: best practices — Quality control techniques Part II: Introduction to Toloka for requesters — How Toloka works — Types of tasks in Toloka — Creating a project in Toloka Part III: Brainstorming the pipeline — Dataset and required labels — Discussion: how to collect labels? — Data labeling pipeline for implementation Part IV: Practical Session Participants: — create — configure — run data labeling projects on real performers in real-time Part V: Theory on efficient aggregation — Aggregation models — Incremental relabeling — Dynamic pricing Part VI: Practical Session — Completing the label collection process Part VII: Discussion of results and conclusions — Project results — Ideas for further work and research — References to literature and other tutorials