We introduce an automatic, end-to-end method for recovering the 3D pose and shape of dogs from monocular internet images. The large variation in shape between dog breeds, significant occlusion and low quality of internet images makes this a challenging problem. We learn a richer prior over shapes than previous work, which helps regularize parameter estimation. We demonstrate results on the Stanford Dog dataset, an 'in the wild' dataset of 20,580 dog images for which we have collected 2D joint and silhouette annotations to split for training and evaluation. In order to capture the large shape variety of dogs, we show that the natural variation in the 2D dataset is enough to learn a detailed 3D prior through expectation maximization (EM). As a by-product of training, we generate a new parameterized model (including limb scaling) SMBLD which we release alongside our new annotation dataset StanfordExtra to the research community.
翻译:我们引入了一种自动、端到端的方法,从单眼互联网图像中恢复3D形形和狗形状。 狗品种的形状差异很大, 互联网图像的隐蔽性和低质量使得这是一个具有挑战性的问题。 我们比以前的工作更早了解形状, 这有助于规范参数估计。 我们在斯坦福狗数据集上展示了结果, 该数据集的“野生”数据集为20 580个狗图像收集了 2D 联合和双影图解, 供培训和评审使用。 为了捕捉大量的狗形, 我们显示2D 数据集的自然变异足以通过预期最大化(EM) 学习详细的3D 。 作为培训的副产品, 我们产生了一个新的参数化模型(包括四肢缩缩放) SMBLD, 与我们新的注解数据集一起向研究界发布StanfordExtra。