Bird's Eye View (BEV) is a popular representation for processing 3D point clouds, and by its nature is fundamentally sparse. Motivated by the computational limitations of mobile robot platforms, we take a fast, high-performance BEV 3D object detector - PointPillars - and modify its backbone to maintain and exploit this input sparsity, leading to decreased runtimes. We present results on KITTI, a canonical 3D detection dataset, and Matterport-Chair, a novel Matterport3D-derived chair detection dataset from scenes in real furnished homes. We evaluate runtime characteristics using a desktop GPU, an embedded ML accelerator, and a robot CPU, demonstrating that our method results in significant runtime decreases (2x or more) for embedded systems with only a modest decrease in detection quality. Our work represents a new approach for practitioners to optimize models for embedded systems by maintaining and exploiting input sparsity throughout their entire pipeline to reduce runtime and resource usage while preserving detection performance. All models, weights, experimental configurations, and datasets used are publicly available.
翻译:鸟眼视图( BEV) 是处理 3D 点云的流行代表, 其性质是基本稀少的。 我们受到移动机器人平台的计算限制的驱动, 我们使用快速、高性能的 BEV 3D 对象探测器 - PpointPillars - 并修改其脊柱以维持和利用这种输入宽度, 导致运行时间下降。 我们展示了KITTI( 3D 探测卡片数据集) 和Timeport主席( 一个全新的Mealport3D 派生椅子检测数据集 ) 。 我们用台式 GPU、 嵌入式 ML 加速器和 机器人 CPU 来评估运行时间特性, 表明我们的方法导致嵌入系统运行时间大幅下降(2x或以上), 且检测质量略有下降。 我们的工作代表了一种新方法, 从业人员通过在整个管道中维护和利用输入宽度来优化嵌入系统模型, 以减少运行时间和资源使用,同时保持探测性能。 所有模型、 重量、 实验配置和数据集都可供公开使用 。