Due to depth ambiguities and occlusions, lifting 2D poses to 3D is a highly ill-posed problem. Well-calibrated distributions of possible poses can make these ambiguities explicit and preserve the resulting uncertainty for downstream tasks. This study shows that previous attempts, which account for these ambiguities via multiple hypotheses generation, produce miscalibrated distributions. We identify that miscalibration can be attributed to the use of sample-based metrics such as minMPJPE. In a series of simulations, we show that minimizing minMPJPE, as commonly done, should converge to the correct mean prediction. However, it fails to correctly capture the uncertainty, thus resulting in a miscalibrated distribution. To mitigate this problem, we propose an accurate and well-calibrated model called Conditional Graph Normalizing Flow (cGNFs). Our model is structured such that a single cGNF can estimate both conditional and marginal densities within the same model - effectively solving a zero-shot density estimation problem. We evaluate cGNF on the Human~3.6M dataset and show that cGNF provides a well-calibrated distribution estimate while being close to state-of-the-art in terms of overall minMPJPE. Furthermore, cGNF outperforms previous methods on occluded joints while it remains well-calibrated.
翻译:由于深度的模糊性和封闭性, 将 2D 设置为 3D 是一个高度错误的问题 。 经过适当校正的分布可能使这些模糊性变得明确, 并保存下游任务的不确定性 。 本研究显示, 先前的尝试 — — 通过多重假设生成这些模糊性, 导致这些模糊性的原因 — — 产生调和错误的分布 。 我们发现, 校正错误可归因于使用样样基测量值, 比如 minMPJPE 。 在一系列模拟中, 我们显示, 像通常一样, 最大限度地降低 minMPJPE 的密度应该与正确的平均预测一致 。 但是, 它无法正确捕捉不确定性, 从而导致一个错误的分布 。 为了缓解这一问题, 我们提出了一个准确和精确的、 校正的模型, 被称为“ 有条件的整形图” 。 我们的模型结构是, 一个单一的GNF 能够估计同一模型的有条件和边缘密度 — — 有效地解决零度的密度估计问题。 我们评估了人类 36.M 数据集中的 CGNF, 并显示, 提供了一个精确的混合分配 。