We present Particulate, a feed-forward approach that, given a single static 3D mesh of an everyday object, directly infers all attributes of the underlying articulated structure, including its 3D parts, kinematic structure, and motion constraints. At its core is a transformer network, Part Articulation Transformer, which processes a point cloud of the input mesh using a flexible and scalable architecture to predict all the aforementioned attributes with native multi-joint support. We train the network end-to-end on a diverse collection of articulated 3D assets from public datasets. During inference, Particulate lifts the network's feed-forward prediction to the input mesh, yielding a fully articulated 3D model in seconds, much faster than prior approaches that require per-object optimization. Particulate can also accurately infer the articulated structure of AI-generated 3D assets, enabling full-fledged extraction of articulated 3D objects from a single (real or synthetic) image when combined with an off-the-shelf image-to-3D generator. We further introduce a new challenging benchmark for 3D articulation estimation curated from high-quality public 3D assets, and redesign the evaluation protocol to be more consistent with human preferences. Quantitative and qualitative results show that Particulate significantly outperforms state-of-the-art approaches.
翻译:本文提出Particulate,一种前馈式方法,能够在给定日常物体的单个静态三维网格时,直接推断底层关节化结构的所有属性,包括其三维部件、运动学结构和运动约束。其核心是一个Transformer网络——部件关节化Transformer,该网络通过灵活可扩展的架构处理输入网格的点云数据,以原生多关节支持的方式预测所有上述属性。我们使用来自公共数据集的多样化关节化三维资产对网络进行端到端训练。在推理过程中,Particulate将网络的前馈预测结果映射至输入网格,数秒内即可生成完全关节化的三维模型,其速度远快于以往需要逐对象优化的方法。Particulate还能准确推断AI生成三维资产的关节化结构,当与现成的图像到三维生成器结合时,可实现从单张(真实或合成)图像中完整提取关节化三维物体。我们进一步引入一个基于高质量公共三维资产构建的、具有挑战性的三维关节化估计新基准,并重新设计了更符合人类偏好的评估协议。定量与定性结果表明,Particulate显著优于现有最先进方法。