对象姿态估计与统计保证：一致性关键点检测和几何不确定性传播 (Object Pose Estimation with Statistical Guarantees: Conformal Keypoint Detection and Geometric Uncertainty Propagation)

The two-stage object pose estimation paradigm first detects semantic keypoints on the image and then estimates the 6D pose by minimizing reprojection errors. Despite performing well on standard benchmarks, existing techniques offer no provable guarantees on the quality and uncertainty of the estimation. In this paper, we inject two fundamental changes, namely conformal keypoint detection and geometric uncertainty propagation, into the two-stage paradigm and propose the first pose estimator that endows an estimation with provable and computable worst-case error bounds. On one hand, conformal keypoint detection applies the statistical machinery of inductive conformal prediction to convert heuristic keypoint detections into circular or elliptical prediction sets that cover the groundtruth keypoints with a user-specified marginal probability (e.g., 90%). Geometric uncertainty propagation, on the other, propagates the geometric constraints on the keypoints to the 6D object pose, leading to a Pose UnceRtainty SEt (PURSE) that guarantees coverage of the groundtruth pose with the same probability. The PURSE, however, is a nonconvex set that does not directly lead to estimated poses and uncertainties. Therefore, we develop RANdom SAmple averaGing (RANSAG) to compute an average pose and apply semidefinite relaxation to upper bound the worst-case errors between the average pose and the groundtruth. On the LineMOD Occlusion dataset we demonstrate: (i) the PURSE covers the groundtruth with valid probabilities; (ii) the worst-case error bounds provide correct uncertainty quantification; and (iii) the average pose achieves better or similar accuracy as representative methods based on sparse keypoints.

翻译：两阶段的对象姿态估计范式首先在图像上检测语义关键点，然后通过最小化重投影误差来估计 6D 姿态。尽管现有技术在标准基准测试中表现良好，但没有提供关于质量和估计不确定性的可证和可计算保证。在本文中，我们将两个基本更改（一致性关键点检测和几何不确定性传播）注入到两阶段范式中，并提出了首个姿态估计器，使其具备可证和可计算的最坏情况误差边界。一方面，一致性关键点检测将归纳证明的统计机制应用于启发式关键点检测，将其转换为圆形或椭圆形预测集，其用 90% 的边际概率（例如）覆盖地面实况关键点。另一方面，几何不确定性传播将关键点上的几何约束传递到 6D 对象姿态，导致姿态不确定性集合（PURSE），保证其用相同概率覆盖地面实况姿态。但是，PURSE 是一个非凸集，不直接导致姿态和不确定性的估计。因此，我们开发了随机样本平均（RANSAG）来计算平均姿态，并应用半正定松弛来上界平均姿态与地面实况之间的最坏情况误差。在 LineMOD 遮挡数据集上，我们证明了：（i）PURSE 用有效概率覆盖地面实况；（ii）最坏情况误差边界提供正确的不确定度量；（iii）平均姿态实现了基于稀疏关键点的代表性方法中更好或类似的准确性。