Detecting small objects in video streams of head-worn augmented reality devices in near real-time is a huge challenge: training data is typically scarce, the input video stream can be of limited quality, and small objects are notoriously hard to detect. In industrial scenarios, however, it is often possible to leverage contextual knowledge for the detection of small objects. Furthermore, CAD data of objects are typically available and can be used to generate synthetic training data. We describe a near real-time small object detection pipeline for egocentric perception in a manual assembly scenario: We generate a training data set based on CAD data and realistic backgrounds in Unity. We then train a YOLOv4 model for a two-stage detection process: First, the context is recognized, then the small object of interest is detected. We evaluate our pipeline on the augmented reality device Microsoft Hololens 2.
翻译:几乎实时地探测在头形放大现实装置视频流中的小物体是一个巨大的挑战:培训数据通常稀缺,输入视频流质量有限,小物体难以探测,然而,在工业假设中,往往有可能利用背景知识探测小物体。此外,物体的CAD数据通常可用,可用于生成合成培训数据。我们在手工组装情景中描述一个接近实时的小物体探测管道,用于自我中心感知:我们根据CAD数据和Unity的现实背景生成了一个培训数据集。然后我们为两阶段检测过程培训一个YOLOv4模型:首先,环境得到承认,然后发现小目标。我们评估了增强的现实装置微软霍洛伦斯2号的管道。我们评估了我们关于微软霍洛伦斯2号的增强的虚拟装置的管道。