The introduction of embedding techniques has pushed forward significantly the Natural Language Processing field. Many of the proposed solutions have been presented for word-level encoding; anyhow, in the last years, new mechanism to treat information at an higher level of aggregation, like at sentence- and document-level, have emerged. With this work we address specifically the sentence embeddings problem, presenting the Static Fuzzy Bag-of-Word model. Our model is a refinement of the Fuzzy Bag-of-Words approach, providing sentence embeddings with a predefined dimension. SFBoW provides competitive performances in Semantic Textual Similarity benchmarks, while requiring low computational resources.
翻译:引入嵌入技术显著推进了自然语言处理领域。许多提出的解决方案均是针对单词级别的编码; 然而,在过去几年中,一些新的机制可以处理更高级别的汇总信息,例如句子和文档级别。在本文中,我们专门解决句子嵌入问题,提出了静态模糊词袋模型。我们的模型是模糊词袋方法的改进,提供具有预定义维度的句子嵌入。SFBoW在语义文本相似性基准测试中表现良好,同时要求低计算资源。