Crowdsourcing allows running simple human intelligence tasks on a large crowd of workers, enabling solving problems for which it is difficult to formulate an algorithm or train a machine learning model in reasonable time. One of such problems is data clustering by an under-specified criterion that is simple for humans, but difficult for machines. In this demonstration paper, we build a crowdsourced system for image clustering and release its code under a free license at https://github.com/Toloka/crowdclustering. Our experiments on two different image datasets, dresses from Zalando's FEIDEGGER and shoes from the Toloka Shoes Dataset, confirm that one can yield meaningful clusters with no machine learning algorithms purely with crowdsourcing.
翻译:众包允许在大量工人中运行简单的人类智力任务,从而解决难以在合理时间内制定算法或训练机器学习模型的问题。其中之一是按人类简单、但对机器来说困难的标准对数据进行聚类的问题。在这篇演示论文中,我们构建了一个面向图像聚类的众包系统,并在https://github.com/Toloka/crowdclustering上发布其代码,该代码采用免费许可证。我们对两个不同的图像数据集进行的实验,即来自Zalando的FEIDEGGER的服装和来自Toloka鞋类数据集的鞋子,证实了只需借助众包即可产生有意义的群集。