In multi-object detection using neural networks, the fundamental problem is, "How should the network learn a variable number of bounding boxes in different input images?". Previous methods train a multi-object detection network through a procedure that directly assigns the ground truth bounding boxes to the specific locations of the network's output. However, this procedure makes the training of a multi-object detection network too heuristic and complicated. In this paper, we reformulate the multi-object detection task as a problem of density estimation of bounding boxes. Instead of assigning each ground truth to specific locations of network's output, we train a network by estimating the probability density of bounding boxes in an input image using a mixture model. For this purpose, we propose a novel network for object detection called Mixture Density Object Detector (MDOD), and the corresponding objective function for the density-estimation-based training. We applied MDOD to MS COCO dataset. Our proposed method not only deals with multi-object detection problems in a new approach, but also improves detection performances through MDOD. The code is available: https://github.com/yoojy31/MDOD.
翻译:在使用神经网络的多球探测中,根本的问题是,“网络应如何在不同输入图像中学习多球检测网络,以不同的输入图像中不同数目的捆绑框?”。以前的方法是通过直接将地面真相绑定框指派给网络输出的具体位置的程序来训练多球检测网络。然而,这一程序使得多球检测网络的培训过于杂乱和复杂。在本文中,我们将多球检测任务重新配置为捆绑盒的密度估计问题。我们建议的方法不是将每个地面真理指派给网络产出的具体位置,而是用混合模型来估计输入图像中捆绑盒的概率密度,以此来训练一个网络。为此,我们提议了一个称为混凝土密度绑定对象检测器(MDOD)和相应的目标功能的新网络。我们把多球检测任务重新配置为MS COCO数据集。我们提议的方法不仅处理新方法中的多球检测问题,而且还通过MDOD改进了检测性能。我们可用的代码是: https://gith/MDYOD.com/MD31。