This paper examines performance evaluation criteria for basic vision tasks involving sets of objects namely, object detection, instance-level segmentation and multi-object tracking. The rankings of algorithms by an existing criterion can fluctuate with different choices of parameters, e.g. Intersection over Union (IoU) threshold, making their evaluations unreliable. More importantly, there is no means to verify whether we can trust the evaluations of a criterion. This work suggests a notion of trustworthiness for performance criteria, which requires (i) robustness to parameters for reliability, (ii) contextual meaningfulness in sanity tests, and (iii) consistency with mathematical requirements such as the metric properties. We observe that these requirements were overlooked by many widely-used criteria, and explore alternative criteria using metrics for sets of shapes. We also assess all these criteria based on the suggested requirements for trustworthiness.
翻译:本文审查了涉及物体探测、例级分解和多点跟踪等一系列物体的基本远景任务的业绩评价标准,现有标准对算法的排位可随不同参数的选择而波动,例如,对Union(IoU)门槛的交点,使得它们的评价不可靠,更重要的是,没有办法核实我们能否信任对某一标准的评价,这项工作为业绩标准提出了一个可信赖性概念,这要求:(一) 可靠性参数的稳健性,(二) 正常度测试的背景意义,以及(三) 与数学要求(如指标属性)的一致性。我们注意到,这些要求被许多广泛使用的标准所忽视,并探索使用成套形状的衡量标准的其他标准。我们还根据提议的可靠性要求评估所有这些标准。