Contemporary autonomous vehicle (AV) benchmarks have advanced techniques for training 3D detectors, particularly on large-scale lidar data. Surprisingly, although semantic class labels naturally follow a long-tailed distribution, contemporary benchmarks focus on only a few common classes (e.g., pedestrian and car) and neglect many rare classes in-the-tail (e.g., debris and stroller). However, AVs must still detect rare classes to ensure safe operation. Moreover, semantic classes are often organized within a hierarchy, e.g., tail classes such as child and construction-worker are arguably subclasses of pedestrian. However, such hierarchical relationships are often ignored, which may lead to misleading estimates of performance and missed opportunities for algorithmic innovation. We address these challenges by formally studying the problem of Long-Tailed 3D Detection (LT3D), which evaluates on all classes, including those in-the-tail. We evaluate and innovate upon popular 3D detection codebases, such as CenterPoint and PointPillars, adapting them for LT3D. We develop hierarchical losses that promote feature sharing across common-vs-rare classes, as well as improved detection metrics that award partial credit to "reasonable" mistakes respecting the hierarchy (e.g., mistaking a child for an adult). Finally, we point out that fine-grained tail class accuracy is particularly improved via multimodal fusion of RGB images with LiDAR; simply put, small fine-grained classes are challenging to identify from sparse (lidar) geometry alone, suggesting that multimodal cues are crucial to long-tailed 3D detection. Our modifications improve accuracy by 5% AP on average for all classes, and dramatically improve AP for rare classes (e.g., stroller AP improves from 3.6 to 31.6)!
翻译:令人惊讶的是,尽管语义类标签自然会经过长尾的分布,但当代基准只关注少数常见类(如行人和汽车),忽视了许多罕见类(如碎片和漫步车),但是,AV必须发现稀有类以确保安全运行。此外,语义类往往在一个等级内组织,例如,尾级班(比如,儿童和建筑工人等尾级,可以说是行人的小类)。然而,这种等级关系往往被忽视,这可能导致对业绩的错误估计和错失了算法创新的机会。我们通过正式研究长尾3D探测(LT3D)的问题来应对这些挑战,而所有类都包括尾尾部。我们用流行的 3D 检测代码来评估和革新,例如: ROD 和 Speak Pillar, 调整它们来适应LT3D。我们单级的等级关系关系往往被忽略。我们发展了等级级级级级级的精度损失,这可能会导致对业绩的准确性进行误导和错失机会。我们正式研究长级的三D检测(LT3D ) 将所有三级的精细的精细的精细的精细的精度评,通过智能等级级升级来提升成为了我们的标准级级级级级级的分。