In the context of supervised machine learning a learning curve describes how a model's performance on unseen data relates to the amount of samples used to train the model. In this paper we present a dataset of plant images with representatives of crops and weeds common to the Manitoba prairies at different growth stages. We determine the learning curve for a classification task on this data with the ResNet architecture. Our results are in accordance with previous studies and add to the evidence that learning curves are governed by power-law relationships over large scales, applications, and models. We further investigate how label noise and the reduction of trainable parameters impacts the learning curve on this dataset. Both effects lead to the model requiring disproportionally larger training sets to achieve the same classification performance as observed without these effects.
翻译:在受监督的机器学习过程中,学习曲线描述了一个模型在无形数据上的性能如何与用于培训模型的样本数量相关。在本文中,我们提供了一组植物图象数据集,其中有不同生长阶段马尼托巴大草原常见的作物和杂草代表。我们确定了与ResNet结构有关的数据分类任务的学习曲线。我们的结果与以往的研究一致,并补充了学习曲线受大尺度、应用程序和模型的权力-法律关系的支配的证据。我们进一步调查了标签噪音和可训练参数的减少如何影响该数据集的学习曲线。这两种影响都导致模型需要规模不相称的培训,以便实现与没有这些效果观测到的相同的分类性能。