Rich geometric understanding of the world is an important component of many robotic applications such as planning and manipulation. In this paper, we present a modular pipeline for pose and shape estimation of objects from RGB-D images given their category. The core of our method is a generative shape model, which we integrate with a novel initialization network and a differentiable renderer to enable 6D pose and shape estimation from a single or multiple views. We investigate the use of discretized signed distance fields as an efficient shape representation for fast analysis-by-synthesis optimization. Our modular framework enables multi-view optimization and extensibility. We demonstrate the benefits of our approach over state-of-the-art methods in several experiments on both synthetic and real data. We open-source our approach at https://github.com/roym899/sdfest.
翻译:对世界的丰富几何理解是规划和操纵等许多机器人应用的重要组成部分。在本文中,我们提出了一个模块化管道,用于对来自RGB-D图像的物体按其类别进行形状和形状估计。我们的方法核心是一个基因形状模型,我们将其与一个新的初始化网络和不同的转化器结合起来,以便从单一或多个角度进行6D的形状和形状估计。我们调查使用离散的、签字的距离字段作为快速分析同步优化的有效形状代表。我们的模块化框架可以实现多视图优化和可扩展性。我们在若干合成和真实数据实验中展示了我们的方法相对于最新方法的好处。我们在https://github.com/roymm899/sdfest上公开介绍了我们的方法。