XPASC:通过解释和联系衡量薄弱监督中的普遍化 (XPASC: Measuring Generalization in Weak Supervision by Explainability and Association)

Weak supervision is leveraged in a wide range of domains and tasks due to its ability to create massive amounts of labeled data, requiring only little manual effort. Standard approaches use labeling functions to specify signals that are relevant for the labeling. It has been conjectured that weakly supervised models over-rely on those signals and as a result suffer from overfitting. To verify this assumption, we introduce a novel method, XPASC (eXPlainability-Association SCore), for measuring the generalization of a model trained with a weakly supervised dataset. Considering the occurrences of features, classes and labeling functions in a dataset, XPASC takes into account the relevance of each feature for the predictions of the model as well as the associations of the feature with the class and the labeling function, respectively. The association in XPASC can be measured in two variants: XPASC-CHI SQAURE measures associations relative to their statistical significance, while XPASC-PPMI measures association strength more generally. We use XPASC to analyze KnowMAN, an adversarial architecture intended to control the degree of generalization from the labeling functions and thus to mitigate the problem of overfitting. On one hand, we show that KnowMAN is able to control the degree of generalization through a hyperparameter. On the other hand, results and qualitative analysis show that generalization and performance do not relate one-to-one, and that the highest degree of generalization does not necessarily imply the best performance. Therefore methods that allow for controlling the amount of generalization can achieve the right degree of benign overfitting. Our contributions in this study are i) the XPASC score to measure generalization in weakly-supervised models, ii) evaluation of XPASC across datasets and models and iii) the release of the XPASC implementation.

翻译：XPASC(XPASC-CHI SQAURE)在一系列广泛的领域和任务中利用了薄弱的监管,这是因为它有能力创建大量标签数据,只需要很少人工操作。标准方法使用标签功能来指定与标签相关的信号。据推测,这些信号上过度重复了监管薄弱的模型,因而有过度的缺陷。为了核实这一假设,我们采用了一种新颖的方法,XPASC(XPASC-SQAURE(XPS-Asociate Score)衡量了与其统计重要性相对的关联,而XPASC-PMI(XPASC-PIMI)则更一般地衡量了连带力。我们使用XPASC(KondMAN)来分析一个功能、类别和标签函数的出现,XPASC(C)考虑到每个功能对于模型预测的相关性,以及特性与等级和标签功能的关联性关系,并由此测量 XPASC(XPAR)的关联性关系。我们使用XPAS-PMI(O)用来控制一个层次的通用化程度,可以显示一个层次的成绩分析。