A recent approach for object detection and human pose estimation is to regress bounding boxes or human keypoints from a central point on the object or person. While this center-point regression is simple and efficient, we argue that the image features extracted at a central point contain limited information for predicting distant keypoints or bounding box boundaries, due to object deformation and scale/orientation variation. To facilitate inference, we propose to instead perform regression from a set of points placed at more advantageous positions. This point set is arranged to reflect a good initialization for the given task, such as modes in the training data for pose estimation, which lie closer to the ground truth than the central point and provide more informative features for regression. As the utility of a point set depends on how well its scale, aspect ratio and rotation matches the target, we adopt the anchor box technique of sampling these transformations to generate additional point-set candidates. We apply this proposed framework, called Point-Set Anchors, to object detection, instance segmentation, and human pose estimation. Our results show that this general-purpose approach can achieve performance competitive with state-of-the-art methods for each of these tasks.
翻译:最近关于物体探测和人体构成估计的方法是,从物体或人的中央点上,从物体或人的中央点点上,向后递回捆绑框或人体关键点。虽然这一中点回归简单而有效,但我们认为,由于物体变形和比例/方向差异,在中央点上提取的图像特征在预测远点关键点或捆绑框边界方面信息有限。为了便于推断,我们提议从一组位于更有利位置的点上进行回归。这个点组安排是为了反映对特定任务的良好初始化,例如,用于预测的训练数据模式,比中心点更接近地面的真相,为回归提供更丰富的信息特征。由于一个点集的效用取决于其规模、方位比率和旋转与目标相匹配的程度,我们采用取样这些转换的锚框技术来产生额外的点定候选人。我们应用这个拟议的框架,称为点-点-点定点分点,用于目标检测、实例分解和人姿势估计。我们的结果表明,这种一般目的方法可以实现业绩竞争,而每个任务都是以最先进的方法进行的。