Word embeddings are a powerful natural lan-guage processing technique, but they are ex-tremely difficult to interpret. To enable inter-pretable NLP models, we create vectors whereeach dimension isinherently interpretable. Byinherently interpretable, we mean a systemwhere each dimension is associated with somehuman-understandablehintthat can describethe meaning of that dimension. In order tocreate more interpretable word embeddings,we transform pretrained dense word embed-dings into sparse embeddings. These new em-beddings are inherently interpretable: each oftheir dimensions is created from and repre-sents a natural language word or specific gram-matical concept. We construct these embed-dings through sparse coding, where each vec-tor in the basis set is itself a word embedding.Therefore, each dimension of our sparse vec-tors corresponds to a natural language word.We also show that models trained using thesesparse embeddings can achieve good perfor-mance and are more interpretable in practice,including through human evaluations.
翻译:单词嵌入是一种强大的自然拉动处理技术, 但是它们以前很难解释 。 为了启用可预知的 NLP 模型, 我们创建了矢量, 每个维度都有内在的解释。 自然解释, 我们指的是一个系统, 每个维度都与某些人类无法理解的维度相关联, 可以描述该维度的含义。 为了创建更多可解释的单词嵌入, 我们将预先训练的密集字嵌入转换成稀薄的嵌入。 这些新隐化是天生的可解释性: 它们的每个维度都是由自然语言或特定语法概念创建和代表的。 我们通过稀疏的编译构建这些嵌入, 每一个维度或基础的每个维度本身都是一个单词嵌入。 因此, 我们的稀疏微维特的每个维度都与一个自然语言相匹配。 我们还显示, 使用这些精密嵌入的模型可以实现良好的渗透度, 并且在实践中更容易解释, 包括通过人类的评价 。