Despite impressive performance on many text classification tasks, deep neural networks tend to learn frequent superficial patterns that are specific to the training data and do not always generalize well. In this work, we observe this limitation with respect to the task of native language identification. We find that standard text classifiers which perform well on the test set end up learning topical features which are confounds of the prediction task (e.g., if the input text mentions Sweden, the classifier predicts that the author's native language is Swedish). We propose a method that represents the latent topical confounds and a model which "unlearns" confounding features by predicting both the label of the input text and the confound; but we train the two predictors adversarially in an alternating fashion to learn a text representation that predicts the correct label but is less prone to using information about the confound. We show that this model generalizes better and learns features that are indicative of the writing style rather than the content.
翻译:尽管在许多文本分类任务上取得了令人印象深刻的成绩,但深神经网络往往会了解培训数据所特有的常见表面模式,而且并不总能一概而论。在这项工作中,我们观察到了对母语识别任务的这一限制。我们发现,在测试设置上表现良好的标准文本分类人员最终会学习与预测任务相混淆的时事特征(例如,如果输入文本提到瑞典,分类人员预测作者的母语是瑞典语)。我们建议一种代表潜在主题混淆的方法,以及一种“不可忽视”混淆特征的模式,方法是预测输入文本的标签和混杂内容;但我们以交替的方式培训这两个预测人员,以学习一种预测正确标签的文本说明,但不太容易使用关于混杂内容的信息。我们表明,这一模型比较概括,并学习了表明写作风格而不是内容的特征。