The citation network of patents citing prior art arises from the legal obligation of patent applicants to properly disclose their invention. One way to study the relationship between current patents and their antecedents is by analyzing the similarity between the textual elements of patents. Patent similarity indicators have been constantly decreasing since the mid-70s. The aim of this work is to investigate the drivers of this downward trend through a general additive model and contextually propose a computationally efficient way to derive the similarity scores across pairs of patent citations leveraging on state-of-the-art tools in Natural Language Processing. We found that by using this non-linear modelling technique we are able to distinguish between distinct, temporally varying drivers of the patent similarity levels that accounts for more variation in the data ($R^2\sim 18\%$) in comparison to the previous literature. Moreover, with such corrections in place, we conclude that the trend in similarity shows a different pattern than the one presented in previous studies.
翻译:援引先前艺术的专利的引证网络源于专利申请人适当披露其发明的法律义务。研究当前专利与其前身之间关系的一种方法是分析专利文本要素之间的相似性。自70年代中期以来,专利相似性指标一直在不断下降。这项工作的目的是通过一个一般添加模型调查这一下降趋势的驱动因素,并针对具体情况提出一种计算效率高的方法,以利用在自然语言处理中最先进的工具进行专利引用,得出相近性分数。我们发现,通过使用这种非线性建模技术,我们能够区分专利相似性水平的不同、时间差异不等的驱动因素,这些驱动因素导致数据(R2\sim 18 ⁇ $)与以往文献的差异更大。此外,我们的结论是,随着这种修正的到位,类似性趋势显示出与以往研究中显示的模式不同。