Object detection constitutes the primary task within the domain of computer vision. It is utilized in numerous domains. Nonetheless, object detection continues to encounter the issue of catastrophic forgetting. The model must be retrained whenever new products are introduced, utilizing not only the new products dataset but also the entirety of the previous dataset. The outcome is obvious: increasing model training expenses and significant time consumption. In numerous sectors, particularly retail checkout, the frequent introduction of new products presents a great challenge. This study introduces Zero-Retraining Based Recognition and Object Detection (ZeBROD), a methodology designed to address the issue of catastrophic forgetting by integrating YOLO11n for object localization with DeIT and Proxy Anchor Loss for feature extraction and metric learning. For classification, we utilize cosine similarity between the embedding features of the target product and those in the Qdrant vector database. In a case study conducted in a retail store with 140 products, the experimental results demonstrate that our proposed framework achieves encouraging accuracy, whether for detecting new or existing products. Furthermore, without retraining, the training duration difference is significant. We achieve almost 3 times the training time efficiency compared to classical object detection approaches. This efficiency escalates as additional new products are added to the product database. The average inference time is 580 ms per image containing multiple products, on an edge device, validating the proposed framework's feasibility for practical use.
翻译:目标检测是计算机视觉领域的一项核心任务,被广泛应用于众多领域。然而,目标检测模型仍然面临灾难性遗忘问题。每当引入新产品时,模型不仅需要使用新产品的数据集,还必须结合全部历史数据集进行重新训练。其结果显而易见:模型训练成本不断攀升,且耗时显著。在零售收银等诸多行业中,新产品的频繁引入构成了巨大挑战。本研究提出了基于零再训练的目标识别与检测方法(ZeBROD),该方法旨在通过整合YOLO11n进行目标定位,以及结合DeIT和代理锚点损失进行特征提取与度量学习,以解决灾难性遗忘问题。在分类阶段,我们利用目标产品的嵌入特征与Qdrant向量数据库中特征之间的余弦相似度进行计算。在一项涵盖140种产品的零售商店案例研究中,实验结果表明,无论是对新产品还是现有产品进行检测,我们所提出的框架均取得了令人鼓舞的准确率。此外,由于无需重新训练,训练时长差异显著。与经典目标检测方法相比,我们的训练时间效率提升了近3倍。随着更多新产品被添加到产品数据库中,这一效率优势将进一步扩大。在边缘设备上,每张包含多个产品的图像平均推理时间为580毫秒,验证了所提框架在实际应用中的可行性。