We present a new approach to instill 4D dynamic object priors into learned 3D representations by unsupervised pre-training. We observe that dynamic movement of an object through an environment provides important cues about its objectness, and thus propose to imbue learned 3D representations with such dynamic understanding, that can then be effectively transferred to improved performance in downstream 3D semantic scene understanding tasks. We propose a new data augmentation scheme leveraging synthetic 3D shapes moving in static 3D environments, and employ contrastive learning under 3D-4D constraints that encode 4D invariances into the learned 3D representations. Experiments demonstrate that our unsupervised representation learning results in improvement in downstream 3D semantic segmentation, object detection, and instance segmentation tasks, and moreover, notably improves performance in data-scarce scenarios.
翻译:我们提出了一个新的方法,将4D动态物体的前身通过未经监督的训练前培养为学习的 3D 表示方式。我们观察到,一个物体在环境中的动态移动对其目标性提供了重要的提示,因此建议以这种动态理解来培养学习的3D 表示方式,然后可以有效地转换到提高下游3D语义场外观理解任务的业绩上。我们提出了一个新的数据增强计划,利用在静态的3D环境中移动的合成3D形状,并在3D-4D限制下采用对比式学习,将4D变量编码为学习的3D 表示方式。 实验表明,我们未经监督的表达方式学习结果改进了下游3D语义分割、对象探测和实例分割任务,此外,还显著改进了数据记录情景的性能。