Pretrained transformers achieve the state of the art across tasks in natural language processing, motivating researchers to investigate their inner mechanisms. One common direction is to understand what features are important for prediction. In this paper, we apply information bottlenecks to analyze the attribution of each feature for prediction on a black-box model. We use BERT as the example and evaluate our approach both quantitatively and qualitatively. We show the effectiveness of our method in terms of attribution and the ability to provide insight into how information flows through layers. We demonstrate that our technique outperforms two competitive methods in degradation tests on four datasets. Code is available at https://github.com/bazingagin/IBA.
翻译:预先培训的变压器在自然语言处理方面实现最新水平,激励研究人员调查其内部机制。一个共同方向是了解哪些特征对预测很重要。在本文中,我们运用信息瓶颈分析黑箱模型预测的每个特性的归属。我们以BERT为例,从数量和质量上评价我们的方法。我们展示了我们的方法在归属和深入了解信息如何通过层层流动方面的有效性。我们证明,我们的技术在四个数据集的降解测试中优于两种竞争性方法。代码可在https://github.com/bazingagin/IBA查阅。