Automatically estimating 3D skeleton, shape, camera viewpoints, and part articulation from sparse in-the-wild image ensembles is a severely under-constrained and challenging problem. Most prior methods rely on large-scale image datasets, dense temporal correspondence, or human annotations like camera pose, 2D keypoints, and shape templates. We propose Hi-LASSIE, which performs 3D articulated reconstruction from only 20-30 online images in the wild without any user-defined shape or skeleton templates. We follow the recent work of LASSIE that tackles a similar problem setting and make two significant advances. First, instead of relying on a manually annotated 3D skeleton, we automatically estimate a class-specific skeleton from the selected reference image. Second, we improve the shape reconstructions with novel instance-specific optimization strategies that allow reconstructions to faithful fit on each instance while preserving the class-specific priors learned across all images. Experiments on in-the-wild image ensembles show that Hi-LASSIE obtains higher quality state-of-the-art 3D reconstructions despite requiring minimum user input.
翻译:自动估算 3D 骨架、 形状、 相机视图以及来自 稀少的 边缘图像群的局部表达是一个严重受限制且具有挑战性的问题。 多数先前的方法都依赖于大型图像数据集、 密集的时间通信、 或像相机姿势、 2D 键点和形状模板这样的人文说明。 我们建议 Hi- LASSIE, 它从野外仅执行 20- 30 个在线图像的 3D 分解重建, 没有任何用户定义的形状或结构模板 。 我们跟踪 LASSIE 最近的工作, 它处理类似的问题设置, 并取得了两个重大进步。 首先, 我们不依靠一个手动的 3D 3D 骨架, 我们自动根据所选的参考图像来估计一个特定等级的骨架。 其次, 我们用新的具体实例优化战略改进形状重建, 使每个实例都能够忠实于每个实例, 同时保存在所有图像中学习到的班级前的图像。 实验显示, H- LASSIE 获得更高质量的 3D 3D 重建, 尽管需要最小的用户投入 。