We present TransLPC, a novel detection model for large point clouds that is based on a transformer architecture. While object detection with transformers has been an active field of research, it has proved difficult to apply such models to point clouds that span a large area, e.g. those that are common in autonomous driving, with lidar or radar data. TransLPC is able to remedy these issues: The structure of the transformer model is modified to allow for larger input sequence lengths, which are sufficient for large point clouds. Besides this, we propose a novel query refinement technique to improve detection accuracy, while retaining a memory-friendly number of transformer decoder queries. The queries are repositioned between layers, moving them closer to the bounding box they are estimating, in an efficient manner. This simple technique has a significant effect on detection accuracy, which is evaluated on the challenging nuScenes dataset on real-world lidar data. Besides this, the proposed method is compatible with existing transformer-based solutions that require object detection, e.g. for joint multi-object tracking and detection, and enables them to be used in conjunction with large point clouds.
翻译:我们提出了基于变压器结构的大点云的新发现模型TranLPC。 虽然变压器的物体探测是一个活跃的研究领域, 但事实证明很难应用这种模型来指出大面积云层, 例如那些在自主驾驶中常见的云层, 使用里达尔或雷达数据。 跨LPC能够解决这些问题: 变压器模型的结构经过修改, 允许较大的输入序列长度, 这对于大点云来说是足够的。 除此之外, 我们提出了一种新的查询精细技术, 以提高探测准确性, 同时保留一个对内存友好的变压器解码查询数。 查询在层间进行重新定位, 以高效的方式将其移到它们所估计的捆绑框上。 这种简单技术对探测精确性有重大影响, 对现实世界里雷达数据上具有挑战性的nuScenes数据集进行评估。 除此之外, 提议的方法与需要物体探测的现有变压器解决方案相兼容, 例如用于联合多点跟踪和探测, 并使其能够与大点云一起使用。