A recent approach for object detection and human pose estimation is to regress bounding boxes or human keypoints from a central point on the object or person. While this center-point regression is simple and efficient, we argue that the image features extracted at a central point contain limited information for predicting distant keypoints or bounding box boundaries, due to object deformation and scale/orientation variation. To facilitate inference, we propose to instead perform regression from a set of points placed at more advantageous positions. This point set is arranged to reflect a good initialization for the given task, such as modes in the training data for pose estimation, which lie closer to the ground truth than the central point and provide more informative features for regression. As the utility of a point set depends on how well its scale, aspect ratio and rotation matches the target, we adopt the anchor box technique of sampling these transformations to generate additional point-set candidates. We apply this proposed framework, called Point-Set Anchors, to object detection, instance segmentation, and human pose estimation. Our results show that this general-purpose approach can achieve performance competitive with state-of-the-art methods for each of these tasks. Code is available at \url{https://github.com/FangyunWei/PointSetAnchor}
翻译:物体探测和人体表面估计的最新方法,是从物体或人的中央点的某个中心点或人身上的某个中心点回归框或人类关键点。虽然这一中点回归简单而有效,但我们认为,由于物体变形和比例/方向差异,在中央点提取的图像特征在预测遥远关键点或约束框边界方面的信息有限,因此,由于物体变形和比例/方向差异,我们建议从一组位于更有利的位置的点进行回归。这个点组安排是为了反映对特定任务的良好初始化,例如,用于预测的训练数据模式,比中心点更接近地面真相,为回归提供更丰富的信息特征。由于一个点集的效用取决于其规模、方位比率和旋转与目标的匹配程度,我们采用了取样这些变异的锚框技术,以产生更多的点定候选人。我们应用这个称为点-Set Achors、目标检测、实例分解和人面的估测。我们的结果表明,这种一般用途方法可以实现业绩竞争,而每个任务的状态-艺术方法则/Frg/SODRQ。