Despite recent monumental advances in the field, many Natural Language Processing (NLP) models still struggle to perform adequately on noisy domains. We propose a novel probabilistic embedding-level method to improve the robustness of NLP models. Our method, Robust Embeddings via Distributions (RED), incorporates information from both noisy tokens and surrounding context to obtain distributions over embedding vectors that can express uncertainty in semantic space more fully than any deterministic method. We evaluate our method on a number of downstream tasks using existing state-of-the-art models in the presence of both natural and synthetic noise, and demonstrate a clear improvement over other embedding approaches to robustness from the literature.
翻译:尽管最近在这一领域取得了巨大进展,许多自然语言处理模式(NLP)仍难以在吵闹的域内充分发挥作用。我们提议了一种新的概率嵌入法,以提高NLP模式的稳健性。我们的方法,即通过发行(RED)的强势嵌入法,纳入了来自吵闹的象征物和周围环境的信息,以获得嵌入矢量的分布,这些矢量比任何确定性方法都能够更充分地表达语义空间的不确定性。我们用现有最先进的模型评估了一些下游任务的方法,在自然和合成噪音存在的情况下,使用现有最先进的模型,并展示了相对于其他嵌入方法的明显改进,从文献中获得稳健性。