Image-based machine learning models can be used to make the sorting and grading of agricultural products more efficient. In many regions, implementing such systems can be difficult due to the lack of centralization and automation of postharvest supply chains. Stakeholders are often too small to specialize in machine learning, and large training data sets are unavailable. We propose a machine learning procedure for images based on pre-trained Vision Transformers. It is easier to implement than the current standard approach of training Convolutional Neural Networks (CNNs) as we do not (re-)train deep neural networks. We evaluate our approach based on two data sets for apple defect detection and banana ripeness estimation. Our model achieves a competitive classification accuracy equal to or less than one percent below the best-performing CNN. At the same time, it requires three times fewer training samples to achieve a 90% accuracy.
翻译:基于图像的机器学习模型可用来提高农产品的分类和分级效率。在许多区域,由于缺乏收获后供应链的集中和自动化,实施这种系统可能很困难。利益攸关方往往太小,无法专门从事机器学习,无法提供大型培训数据集。我们提出一个基于预先培训的愿景变异器图像的机器学习程序。实施比当前培训革命神经网络的标准方法更容易,因为我们没有(再)培训深层神经网络。我们根据苹果缺陷检测和香蕉成熟程度估计的两个数据集评估我们的方法。我们的模型的竞争性分类准确度达到低于或低于最优秀CNN的1%。与此同时,它需要比目前培训革命神经网络的标准方法少三倍的培训样本才能达到90%的精确度。