Detecting other agents and forecasting their behavior is an integral part of the modern robotic autonomy stack, especially in safety-critical scenarios entailing human-robot interaction such as autonomous driving. Due to the importance of these components, there has been a significant amount of interest and research in perception and trajectory forecasting, resulting in a wide variety of approaches. Common to most works, however, is the use of the same few accuracy-based evaluation metrics, e.g., intersection-over-union, displacement error, log-likelihood, etc. While these metrics are informative, they are task-agnostic and outputs that are evaluated as equal can lead to vastly different outcomes in downstream planning and decision making. In this work, we take a step back and critically assess current evaluation metrics, proposing task-aware metrics as a better measure of performance in systems where they are deployed. Experiments on an illustrative simulation as well as real-world autonomous driving data validate that our proposed task-aware metrics are able to account for outcome asymmetry and provide a better estimate of a model's closed-loop performance.
翻译:检测其他物剂和预测其行为是现代机器人自主堆积的一个有机组成部分,特别是在安全临界情景中,包括人与机器人之间的相互作用,例如自主驱动。由于这些组成部分的重要性,人们对认知和轨迹预测产生了大量的兴趣和研究,从而产生了各种各样的方法。然而,大多数工作通常使用同样的少数基于准确性的评价指标,例如交叉连接、迁移错误、对日志相似等。这些指标虽然信息丰富,但任务认知性指标和被评价为等结果的输出在下游规划和决策方面可以导致大不相同的结果。在这项工作中,我们采取了倒退步骤,严格评估目前的评价指标,提出任务认知度指标,作为衡量所部署系统业绩的更好尺度。对说明性模拟和真实世界自主驱动数据的实验表明,我们所提议的任务认知性指标能够说明结果的不对称性,并更好地估计模型的闭环性表现。