Auto-evaluation aims to automatically evaluate a trained model on any test dataset without human annotations. Most existing methods utilize global statistics of features extracted by the model as the representation of a dataset. This ignores the influence of the classification head and loses category-wise confusion information of the model. However, ratios of instances assigned to different categories together with their confidence scores reflect how many instances in which categories are difficult for the model to classify, which contain significant indicators for both overall and category-wise performances. In this paper, we propose a Confidence-based Category Relation-aware Regression ($C^2R^2$) method. $C^2R^2$ divides all instances in a meta-set into different categories according to their confidence scores and extracts the global representation from them. For each category, $C^2R^2$ encodes its local confusion relations to other categories into a local representation. The overall and category-wise performances are regressed from global and local representations, respectively. Extensive experiments show the effectiveness of our method.
翻译:自动评估旨在对任何测试数据集进行无需人工注释的自动评估。现有方法大多使用模型提取的特征的全局统计数据作为数据集的表征。这忽略了分类头的影响,且丧失了模型的类别混淆信息。然而,不同类别的实例分配比率及其置信度得分反映了模型难以分类哪些类别的实例,包含了整体和类别层面性能的显著指标。本文提出了一种自信度和分类关系感知的自动评估回归技术($C^2R^2$)。$C^2R^2$将元集中的所有实例根据置信度分数分为不同类别,并从其中提取全局表征。对于每个类别,$C^2R^2$将其与其他类别之间的本地困惑关系编码为本地表征。从全局和本地表征中回归整体和类别层面性能。大量实验证明了我们方法的有效性。