Designing an automatic checkout system for retail stores at the human level accuracy is challenging due to similar appearance products and their various poses. This paper addresses the problem by proposing a method with a two-stage pipeline. The first stage detects class-agnostic items, and the second one is dedicated to classify product categories. We also track the objects across video frames to avoid duplicated counting. One major challenge is the domain gap because the models are trained on synthetic data but tested on the real images. To reduce the error gap, we adopt domain generalization methods for the first-stage detector. In addition, model ensemble is used to enhance the robustness of the 2nd-stage classifier. The method is evaluated on the AI City challenge 2022 -- Track 4 and gets the F1 score $40\%$ on the test A set. Code is released at the link https://github.com/cybercore-co-ltd/aicity22-track4.
翻译:由于相近的外观产品及其各种成份,设计人类零售商店自动检出系统具有挑战性。本文件通过提出使用两阶段管道的方法来解决这一问题。第一阶段检测类不可知性物品,而第二阶段专门对产品分类。我们还跟踪视频框架的物体以避免重复计数。一个重大挑战是域间差距,因为模型经过合成数据培训,但在真实图像上进行了测试。为了减少误差,我们为第一阶段探测器采用了域域通用方法。此外,模型组合用于加强第二阶段分类器的稳健性。该方法在AI城市挑战2022 - 轨道4上进行了评估,并在测试A集上获得F1 - 40 美元分。代码在链接https://github.com/cybercore-co-ltd/aitell22-tract4上发布。