培训数据影响第一比最后更好 (First is Better Than Last for Training Data Influence)

The ability to identify influential training examples enables us to debug training data and explain model behavior. Existing techniques to do so are based on the flow of training data influence through the model parameters. For large models in NLP applications, it is often computationally infeasible to study this flow through all model parameters, therefore techniques usually pick the last layer of weights. However, we observe that since the activation connected to the last layer of weights contains ``shared logic'', the data influenced calculated via the last layer weights prone to a ``cancellation effect'', where the data influence of different examples have large magnitude that contradicts each other. The cancellation effect lowers the discriminative power of the influence score, and deleting influential examples according to this measure often does not change the model's behavior by much. To mitigate this, we propose a technique called TracIn-WE that modifies a method called TracIn to operate on the word embedding layer instead of the last layer, where the cancellation effect is less severe. One potential concern is that influence based on the word embedding layer may not encode sufficient high level information. However, we find that gradients (unlike embeddings) do not suffer from this, possibly because they chain through higher layers. We show that TracIn-WE significantly outperforms other data influence methods applied on the last layer by 4-10 on the case deletion evaluation on three language classification tasks. In addition, TracIn-WE can produce scores not just at the level of the overall training input, but also at the level of words within the training input, a further aid in debugging.

翻译：识别有影响力的培训实例的能力使我们能够调试培训数据,并解释模型行为。这样做的现有技术基于通过模型参数对培训数据的影响。对于NLP应用程序中的大型模型, 通常计算不适宜通过所有模型参数来研究这种流动, 因此技术通常会选择最后一层的重量。然而, 我们观察到, 由于连接上一层重量的激活方法包含“ 共享逻辑 ”, 通过容易发生“ 取消效应” 的最后一层重量计算出的数据, 不同的例子的数据影响程度大, 彼此相矛盾。取消效果降低了影响评分的差别性力量, 并且根据这一尺度删除有影响力的例子往往不会大大改变模型的行为。为了减轻这一点, 我们提出了一个叫做 TracIn- WEE 的技术, 以“ 共享逻辑 ” ( 共享逻辑 ) 来调整“ 嵌入层” 而不是最后一个层, 取消效果不那么, 一个潜在的关切是, 嵌入层这个词的影响可能不会在高层次的信息中编码足够。 Indevelopment of lades the ladection laction the ladevelment the other develmental develment laction laction lady lax the s last the s last tradetradetradestration)