Semantic similarity analysis and modeling is a fundamentally acclaimed task in many pioneering applications of natural language processing today. Owing to the sensation of sequential pattern recognition, many neural networks like RNNs and LSTMs have achieved satisfactory results in semantic similarity modeling. However, these solutions are considered inefficient due to their inability to process information in a non-sequential manner, thus leading to the improper extraction of context. Transformers function as the state-of-the-art architecture due to their advantages like non-sequential data processing and self-attention. In this paper, we perform semantic similarity analysis and modeling on the U.S Patent Phrase to Phrase Matching Dataset using both traditional and transformer-based techniques. We experiment upon four different variants of the Decoding Enhanced BERT - DeBERTa and enhance its performance by performing K-Fold Cross-Validation. The experimental results demonstrate our methodology's enhanced performance compared to traditional techniques, with an average Pearson correlation score of 0.79.
翻译:在许多自然语言处理的开创性应用中,语义相似性分析和建模是当今许多自然语言处理的创举性应用中一项最受欢迎的任务。由于有连续模式识别的感觉,许多神经网络,如RNNs和LSTMs,在语义相似性建模方面都取得了令人满意的结果。然而,这些解决方案被认为效率低下,因为它们无法以非顺序方式处理信息,从而导致对背景的不当提取。变异器由于非顺序数据处理和自留等优势而发挥最先进的结构功能。在本文件中,我们用传统和变异器技术对美国专利语系对配对数据集进行语义相似性分析和建模。我们实验了四种不同的变体,即脱氧增强型BERT-DERBERTA,并通过执行K-Fold交叉估价来提高其性能。实验结果表明,我们的方法与传统技术相比,其性能有所提高,平均比Pearson相关评分为0.79。