We propose a novel non-parametric/un-trainable language model, named Non-Parametric Pairwise Attention Random Walk Model (NoPPA), to generate sentence embedding only with pre-trained word embedding and pre-counted word frequency. To the best we know, this study is the first successful attempt to break the constraint on bag-of-words assumption with a non-parametric attention mechanism. We evaluate our method on eight different downstream classification tasks. The experiment results show that NoPPA outperforms all kinds of bag-of-words-based methods in each dataset and provides a comparable or better performance than the state-of-the-art non-parametric methods on average. Furthermore, visualization supports that NoPPA can understand contextual topics, common phrases, and word causalities. Our model is available at https://github.com/JacksonWuxs/NoPPA.
翻译:我们建议采用新的非参数/不加密语言模型,名为“非光学对称对称随机行走模型”,以生成仅以预先训练的字嵌入和预计的字频率嵌入的句子。据我们所知,本研究是第一个成功尝试,以非参数关注机制打破对字袋假设的限制。我们评估了我们关于八种不同下游分类任务的方法。实验结果表明,诺巴对称在每套数据集中优于所有类型的基于字袋的方法,并提供了比目前最先进的非参数方法的平均可比或更好的性能。此外,可视化还支持诺巴对上下文主题、共同短语和文字因果关系的理解。我们的模型可在https://github.com/JacksonWuxs/NoPPA查阅。</s>