Transformers have recently been utilized to perform object detection and tracking in the context of autonomous driving. One unique characteristic of these models is that attention weights are computed in each forward pass, giving insights into the model's interior, in particular, which part of the input data it deemed interesting for the given task. Such an attention matrix with the input grid is available for each detected (or tracked) object in every transformer decoder layer. In this work, we investigate the distribution of these attention weights: How do they change through the decoder layers and through the lifetime of a track? Can they be used to infer additional information about an object, such as a detection uncertainty? Especially in unstructured environments, or environments that were not common during training, a reliable measure of detection uncertainty is crucial to decide whether the system can still be trusted or not.
翻译:最近,在自动驱动的背景下,转换器被用于进行物体探测和跟踪。这些模型的一个独特特征是,在每一个前方传票中计算注意权重,深入了解模型的内部,特别是模型认为对特定任务感兴趣的输入数据中的哪一部分。在每一个变压器解码层中,每个被检测到的(或被跟踪的)物体都有这种输入网格的注意矩阵。在这项工作中,我们调查这些注意权重的分布:它们如何通过解码层和轨道的寿命变化?这些模型能否用来推断关于某一物体的额外信息,例如检测不确定性?特别是在非结构化环境中,或者培训期间不常见的环境,可靠的检测不确定性测量对于决定系统是否仍然可以信任至关重要。