We introduce PointConvFormer, a novel building block for point cloud based deep network architectures. Inspired by generalization theory, PointConvFormer combines ideas from point convolution, where filter weights are only based on relative position, and Transformers which utilize feature-based attention. In PointConvFormer, attention computed from feature difference between neighboring points is used to modify the convolutional weights at each point. Hence, invariances from point convolution are preserved, whereas attention helps to select relevant points in the neighborhood. PointConvFormer is suitable for multiple tasks that require details at the point level, such as segmentation and scene flow estimation tasks. We experiment on both tasks with multiple datasets including ScanNet, SemanticKitti, FlyingThings3D and KITTI. Our results show that PointConvFormer substantially outperforms classic convolutions, regular transformers, and voxelized sparse convolution approaches with much smaller and faster networks. Visualizations show that PointConvFormer performs similarly to convolution on flat areas, whereas the neighborhood selection effect is stronger on object boundaries, showing that it has got the best of both worlds.
翻译:我们引入了点ConvFormer, 这是基于深网络结构的点云的新构件。 在一般化理论的启发下, 点ConvFormer 将来自点的观念融合在一起, 过滤器的重量仅基于相对位置, 以及使用基于特性的注意的变异器。 在点ConvFormer 中, 从相邻点之间的特征差异计算出来, 来改变每个点的变异权重。 因此, 点的变异会保存下来, 而关注有助于在附近选择相关点。 点Convormer 适合在点一级需要细节的多重任务, 比如分割和场景流估计任务。 我们用多个数据集, 包括扫描网、 斯曼提基蒂、 FlightTHings3D 和 KITTI 来实验这两个任务。 我们的结果表明, 点ConvorFormer 明显地超越了典型的变异变变变变, 常规变异性稀变的网络 。 可视化显示, 点Conformer 在平面区域也有相似的演变, 而相效果在对象边界上则更强大, 显示它获得了最佳的世界。