Measuring similarity between patents is an essential step to ensure novelty of innovation. However, a large number of methods of measuring the similarity between patents still rely on manual classification of patents by experts. Another body of research has proposed automated methods; nevertheless, most of it solely focuses on the semantic similarity of patents. In order to tackle these limitations, we propose a hybrid method for automatically measuring the similarity between patents, considering both semantic and technological similarities. We measure the semantic similarity based on patent texts using BERT, calculate the technological similarity with IPC codes using Jaccard similarity, and perform hybridization by assigning weights to the two similarity methods. Our evaluation result demonstrates that the proposed method outperforms the baseline that considers the semantic similarity only.
翻译:测量专利相似性是确保创新新颖性的重要步骤。然而,仍有大量测量专利相似性的方法依赖于专家的手动分类。另一些研究提出了自动化方法,但大多数方法仅关注专利的语义相似性。为了解决这些限制,提出了一种新型的混合方法,用于自动测量专利之间的相似性,同时考虑语义和技术上的相似性。我们使用BERT基于专利文本测量语义相似性,使用Jaccard相似度计算IPC代码的技术相似性,并通过为两种相似性方法分配权重来进行混合。我们的评估结果表明,所提出的方法优于仅考虑语义相似性的基线方法。