One of the biggest challenges in machine learning is data collection. Training data is an important part since it determines how the model will behave. In object classification, capturing a large number of images per object and in different conditions is not always possible and can be very time-consuming and tedious. Accordingly, this work explores the creation of artificial images using a game engine to cope with limited data in the training dataset. We combine real and synthetic data to train the object classification engine, a strategy that has shown to be beneficial to increase confidence in the decisions made by the classifier, which is often critical in industrial setups. To combine real and synthetic data, we first train the classifier on a massive amount of synthetic data, and then we fine-tune it on real images. Another important result is that the amount of real images needed for fine-tuning is not very high, reaching top accuracy with just 12 or 24 images per class. This substantially reduces the requirements of capturing a great amount of real data.
翻译:机器学习的最大挑战之一是数据收集。 培训数据是一个重要的部分,因为它决定了模型将如何运行。 在目标分类中,捕获每个物体和不同条件下的大量图像并不总是可能的,而且可能非常耗时和乏味。 因此,这项工作探索了利用游戏引擎制造人工图像,以应对培训数据集中有限的数据。 我们结合了真实和合成数据来培训物体分类引擎, 该战略显示有利于增强对分类器所作决定的信心, 而分类器在工业设置中往往至关重要。 为了将真实和合成数据结合起来,我们首先对分类器进行大量合成数据的培训,然后对真实图像进行微调。 另一个重要的结果是,微调所需的真实图像数量并不高,以每类12或24个图像达到最高精度。 这大大降低了获取大量真实数据的要求。