Selective rationalization aims to produce decisions along with rationales (e.g., text highlights or word alignments between two sentences). Commonly, rationales are modeled as stochastic binary masks, requiring sampling-based gradient estimators, which complicates training and requires careful hyperparameter tuning. Sparse attention mechanisms are a deterministic alternative, but they lack a way to regularize the rationale extraction (e.g., to control the sparsity of a text highlight or the number of alignments). In this paper, we present a unified framework for deterministic extraction of structured explanations via constrained inference on a factor graph, forming a differentiable layer. Our approach greatly eases training and rationale regularization, generally outperforming previous work on what comes to performance and plausibility of the extracted rationales. We further provide a comparative study of stochastic and deterministic methods for rationale extraction for classification and natural language inference tasks, jointly assessing their predictive power, quality of the explanations, and model variability.
翻译:选择性合理化的目的是根据理由(例如,文本亮点或两个句子之间的字数校正)作出决定。通常,理由的模型是随机的二元面罩,需要基于抽样的梯度估计仪,这使培训复杂化,需要仔细的超光度校正。 偏差的注意机制是一种决定性的替代办法,但缺乏使理由提取规范化的方法(例如,控制文本亮点的广度或校正数)。在本文件中,我们提出了一个统一的框架,通过对要素图的有限推断,形成一个不同的层,对结构化解释进行确定性提取。我们的方法极大地便利了培训和理由规范化,一般比以前关于得出理由的性能和可信赖性的工作要好。我们进一步对分类和自然语言推断任务的理由提取方法和确定性方法进行比较研究,共同评估其预测力、解释质量和模型变异性。