简介： “知识神经元网络”KNN（Knowledge neural network）是一种以“神经元网络”模型 为基础的知识组织方法。在现实中，“知识”通常是用文字进行叙述，知识蕴藏在自然语言叙述的内容和逻辑关系中。即便是抽象的数学知识，虽然采用数学符号语言进行定义和推导，但仍然离不开自然语言进行说明，否则很难被人理解。 在“知识神经元网络”KNN 中，所谓的“知识”，是描述一个“知识”的文本，如一个网页、Word、PDF 文档等。可从多维度（或仅一个）来描述“知识”。如，对一个疾病知识的描述，可有：症状、发病原因、检查手段、治疗方法等。 建立 KNN，首先将文本信息（网页、word、pdf 等）进行“知识化”处理，形成半结构化的“知识记录” ；然后，对“知识”进行相关性计算，使相关的“知识”建立连接，将杂乱无章、零星、无序的“知识” ，按相关性进行聚类，形成相互联通的“知识神经元网络”。
This study provides an efficient approach for using text data to calculate patent-to-patent (p2p) technological similarity, and presents a hybrid framework for leveraging the resulting p2p similarity for applications such as semantic search and automated patent classification. We create embeddings using Sentence-BERT (SBERT) based on patent claims. We leverage SBERTs efficiency in creating embedding distance measures to map p2p similarity in large sets of patent data. We deploy our framework for classification with a simple Nearest Neighbors (KNN) model that predicts Cooperative Patent Classification (CPC) of a patent based on the class assignment of the K patents with the highest p2p similarity. We thereby validate that the p2p similarity captures their technological features in terms of CPC overlap, and at the same demonstrate the usefulness of this approach for automatic patent classification based on text data. Furthermore, the presented classification framework is simple and the results easy to interpret and evaluate by end-users. In the out-of-sample model validation, we are able to perform a multi-label prediction of all assigned CPC classes on the subclass (663) level on 1,492,294 patents with an accuracy of 54% and F1 score > 66%, which suggests that our model outperforms the current state-of-the-art in text-based multi-label and multi-class patent classification. We furthermore discuss the applicability of the presented framework for semantic IP search, patent landscaping, and technology intelligence. We finally point towards a future research agenda for leveraging multi-source patent embeddings, their appropriateness across applications, as well as to improve and validate patent embeddings by creating domain-expert curated Semantic Textual Similarity (STS) benchmark datasets.