Crowdsourcing allows running simple human intelligence tasks on a large crowd of workers, enabling solving problems for which it is difficult to formulate an algorithm or train a machine learning model in reasonable time. One of such problems is data clustering by an under-specified criterion that is simple for humans, but difficult for machines. In this demonstration paper, we build a crowdsourced system for image clustering and release its code under a free license at https://github.com/Toloka/crowdclustering. Our experiments on two different image datasets, dresses from Zalando's FEIDEGGER and shoes from the Toloka Shoes Dataset, confirm that one can yield meaningful clusters with no machine learning algorithms purely with crowdsourcing.
翻译:众包可以对大批工人执行简单的人类情报任务,从而解决在合理时间内难以制定算法或培训机器学习模型的问题。 其中一个问题就是按照对人类而言简单但对机器来说难以做到的不足规定的标准进行数据分组。 在这个演示文件中,我们建立了一个众包集图象的系统,并以免费许可证在https://github.com/Toloka/crowd群集下发布其代码。 我们在两个不同的图像数据集、Zalando FEIDEGER的服装和Toloka Shoes数据集的鞋子上进行的实验证实,一个人可以产生有意义的集群,而没有纯粹用众包的机器学习算法。