3D point-clouds and 2D images are different visual representations of the physical world. While human vision can understand both representations, computer vision models designed for 2D image and 3D point-cloud understanding are quite different. Our paper investigates the potential for transferability between these two representations by empirically investigating whether this approach works, what factors affect the transfer performance, and how to make it work even better. We discovered that we can indeed use the same neural net model architectures to understand both images and point-clouds. Moreover, we can transfer pretrained weights from image models to point-cloud models with minimal effort. Specifically, based on a 2D ConvNet pretrained on an image dataset, we can transfer the image model to a point-cloud model by \textit{inflating} 2D convolutional filters to 3D then finetuning its input, output, and optionally normalization layers. The transferred model can achieve competitive performance on 3D point-cloud classification, indoor and driving scene segmentation, even beating a wide range of point-cloud models that adopt task-specific architectures and use a variety of tricks.
翻译:3D 点球和 2D 图像是物理世界的不同视觉表现形式。 虽然人类的视觉可以理解两种表达方式, 但为 2D 图像和 3D 点球理解设计的计算机视觉模型是完全不同的。 我们的论文通过实验性调查该方法是否有效、 哪些因素影响转移性能以及如何使其更好发挥作用来调查这两种表达方式之间的可转移性。 我们发现, 我们确实可以使用相同的神经网模型结构来理解图像和点球。 此外, 我们还可以尽量把预制的重量从图像模型转移到点球模型。 具体地说, 基于 2D ConvNet 预先在图像数据集上训练的2D ConvNet, 我们可以通过\ textit{ 膨胀} 2D 进化过滤器将图像模型转移到点球模型到 3D, 然后调整其输入、输出和可选的正常化层。 传输模型可以在 3D 点球分分类、 室内和驱动场分割上实现竞争性表现, 甚至可以击打一系列采用特定结构并使用各种策略的点球形模型。