Transformer models have achieved promising performances in point cloud segmentation. However, most existing attention schemes provide the same feature learning paradigm for all points equally and overlook the enormous difference in size among scene objects. In this paper, we propose the Size-Aware Transformer (SAT) that can tailor effective receptive fields for objects of different sizes. Our SAT achieves size-aware learning via two steps: introduce multi-scale features to each attention layer and allow each point to choose its attentive fields adaptively. It contains two key designs: the Multi-Granularity Attention (MGA) scheme and the Re-Attention module. The MGA addresses two challenges: efficiently aggregating tokens from distant areas and preserving multi-scale features within one attention layer. Specifically, point-voxel cross attention is proposed to address the first challenge, and the shunted strategy based on the standard multi-head self attention is applied to solve the second. The Re-Attention module dynamically adjusts the attention scores to the fine- and coarse-grained features output by MGA for each point. Extensive experimental results demonstrate that SAT achieves state-of-the-art performances on S3DIS and ScanNetV2 datasets. Our SAT also achieves the most balanced performance on categories among all referred methods, which illustrates the superiority of modelling categories of different sizes. Our code and model will be released after the acceptance of this paper.
翻译:然而,大多数现有关注方案都为所有点提供了相同的特征学习模式,并忽略了现场物体之间的巨大大小差异。在本文件中,我们提议了“大小软件变换器”(SAT),可以针对不同大小的物体调整有效的可接受字段。我们的卫星通过两个步骤实现体积认知学习:向每个注意层引入多尺度特征,允许每个点以适应性的方式选择其关注字段。它包含两个关键设计:多组合关注(MGA)计划和再保留模块。MGA处理两个挑战:高效率地汇总远处的标牌,并在一个注意层内保留多尺度的特征。具体地说,点电动交叉关注是为了应对第一个挑战,而基于标准多头自我关注的偏差战略则用于解决第二个问题。再保留模块将注意力的分数动态调整到每个点的微小和粗微的特性输出。广度的实验结果显示,沙特卫星卫星在SARSAT2级后,将实现我们SARSAT最高性能级的升级性能分析方法。