原创专利文本的艺术搜索和排序 (Prior Art Search and Reranking for Generated Patent Text) - 专知论文

会员服务 ·

0

GPT-2 · 相似度 · 秩 · MoDELS · BERT ·

2021 年 7 月 18 日

Prior Art Search and Reranking for Generated Patent Text

翻译：原创专利文本的艺术搜索和排序

Jieh-Sheng Lee,Jieh Hsiang

from arxiv, 7 pages, 3 figures, 1 table

Generative models, such as GPT-2, have demonstrated impressive results recently. A fundamental question we'd like to address is: where did the generated text come from? This work is our initial effort toward answering the question by using prior art search. The purpose of the prior art search is to find the most similar prior text in the training data of GPT-2. We take a reranking approach and apply it to the patent domain. Specifically, we pre-train GPT-2 models from scratch by using the patent data from the USPTO. The input for the prior art search is the patent text generated by the GPT-2 model. We also pre-trained BERT models from scratch for converting patent text to embeddings. The steps of reranking are: (1) search the most similar text in the training data of GPT-2 by taking a bag-of-word ranking approach (BM25), (2) convert the search results in text format to BERT embeddings, and (3) provide the final result by ranking the BERT embeddings based on their similarities with the patent text generated by GPT-2. The experiments in this work show that such reranking is better than ranking with embeddings alone. However, our mixed results also indicate that calculating the semantic similarities among long text spans is still challenging. To our knowledge, this work is the first to implement a reranking system to identify retrospectively the most similar inputs to a GPT model based on its output.

翻译：例如 GPT-2 等生成模型最近显示了令人印象深刻的结果。我们想要解决的一个基本问题是: 生成的文本来自何方? 这项工作是我们最初努力通过使用先前的艺术搜索来回答问题。先前艺术搜索的目的是在GPT-2 的培训数据中找到最相似的先前文本。我们采取重新排序方法, 并将其应用于专利领域。具体地说, 我们从零开始将 GPT-2 模型从零开始, 使用USPTO 的专利数据进行预培训。之前艺术搜索的输入是 GPT-2 模型产生的专利文本。我们还预先训练了BERT 模型, 以便把专利文本转换成嵌入嵌入。重新排序的步骤是:(1) 通过采用一袋字母排序法( B25) 搜索GPT-2 培训数据中最相似的文本。 (2) 将文本格式的搜索结果转换为 BERT 嵌入最后的结果是根据它们与 GPT-2 模型生成的专利文本的相似之处进行排序。我们的实验显示, 将BERT-2 模型的模型从从抓取模型从抓分到将专利文本转换为嵌入到嵌入到嵌入。。的模型的模型的模型的模型中, 步骤步骤的顺序比我们长的顺序比重的顺序比重的顺序要好得多的顺序, 。

0

相关内容

GPT-2

【EMNLP2020】自然语言生成，Neural Language Generation

【EMNLP2020】自然语言生成，Neural Language Generation

专知会员服务

39+阅读 · 2020年11月20日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

深度学习搜索，Exploring Deep Learning for Search

深度学习搜索，Exploring Deep Learning for Search

专知会员服务

61+阅读 · 2020年5月9日

【微软雷德蒙研究院】小样本自然语言生成，Few-shot Natural Language Generation for Task-Oriented Dialog

【微软雷德蒙研究院】小样本自然语言生成，Few-shot Natural Language Generation for Task-Oriented Dialog

专知会员服务

33+阅读 · 2020年2月29日

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

专知会员服务

43+阅读 · 2020年1月28日

【BERT改进ElasticSearch的相关准确率80%以上】NBoost: Boost Elasticsearch Search Relevance by 80% with BERT

【BERT改进ElasticSearch的相关准确率80%以上】NBoost: Boost Elasticsearch Search Relevance by 80% with BERT

专知会员服务

13+阅读 · 2019年11月26日

【CCL 2019】ATT-第19期：文本生成 |Text Generation: From the Perspective of Interactive Inference （张家俊）

【CCL 2019】ATT-第19期：文本生成 |Text Generation: From the Perspective of Interactive Inference （张家俊）

专知会员服务

43+阅读 · 2019年11月12日

【EMNLP2019Keynote报告】神经序列模型， Neural Sequence Models，63页ppt

【EMNLP2019Keynote报告】神经序列模型， Neural Sequence Models，63页ppt

专知会员服务

27+阅读 · 2019年11月10日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

【RecSys 2019报告】用于旅游业的推荐系统（Building Useful Recommender Systems for Tourists）

【RecSys 2019报告】用于旅游业的推荐系统（Building Useful Recommender Systems for Tourists）

专知会员服务

32+阅读 · 2019年9月19日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

已删除

将门创投

12+阅读 · 2018年6月25日

A Plug-and-Play Method for Controlled Text Generation

A Plug-and-Play Method for Controlled Text Generation

Arxiv

2+阅读 · 2021年9月20日

RnG-KBQA: Generation Augmented Iterative Ranking for Knowledge Base Question Answering

Arxiv

1+阅读 · 2021年9月17日

Pretrained Transformers for Text Ranking: BERT and Beyond

Arxiv

28+阅读 · 2020年10月13日

Adversarial Mutual Information for Text Generation

Adversarial Mutual Information for Text Generation

Arxiv

13+阅读 · 2020年6月30日

Few-shot Natural Language Generation for Task-Oriented Dialog

Few-shot Natural Language Generation for Task-Oriented Dialog

Arxiv

30+阅读 · 2020年2月27日

GREASE: A Generative Model for Relevance Search over Knowledge Graphs

Arxiv

4+阅读 · 2019年10月11日

Text Summarization with Pretrained Encoders

Arxiv

5+阅读 · 2019年8月22日

Graph Convolutional Networks for Text Classification

Arxiv

11+阅读 · 2018年10月17日

Learning to Focus when Ranking Answers

Learning to Focus when Ranking Answers

Arxiv

5+阅读 · 2018年8月8日

The Search Problem in Mixture Models

Arxiv

3+阅读 · 2018年2月24日

VIP会员

文章信息

相关主题

相关VIP内容

【EMNLP2020】自然语言生成，Neural Language Generation

【EMNLP2020】自然语言生成，Neural Language Generation

专知会员服务

39+阅读 · 2020年11月20日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

深度学习搜索，Exploring Deep Learning for Search

深度学习搜索，Exploring Deep Learning for Search

专知会员服务

61+阅读 · 2020年5月9日

【微软雷德蒙研究院】小样本自然语言生成，Few-shot Natural Language Generation for Task-Oriented Dialog

【微软雷德蒙研究院】小样本自然语言生成，Few-shot Natural Language Generation for Task-Oriented Dialog

专知会员服务

33+阅读 · 2020年2月29日

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

专知会员服务

43+阅读 · 2020年1月28日

【BERT改进ElasticSearch的相关准确率80%以上】NBoost: Boost Elasticsearch Search Relevance by 80% with BERT

【BERT改进ElasticSearch的相关准确率80%以上】NBoost: Boost Elasticsearch Search Relevance by 80% with BERT

专知会员服务

13+阅读 · 2019年11月26日

【CCL 2019】ATT-第19期：文本生成 |Text Generation: From the Perspective of Interactive Inference （张家俊）

【CCL 2019】ATT-第19期：文本生成 |Text Generation: From the Perspective of Interactive Inference （张家俊）

专知会员服务

43+阅读 · 2019年11月12日

【EMNLP2019Keynote报告】神经序列模型， Neural Sequence Models，63页ppt

【EMNLP2019Keynote报告】神经序列模型， Neural Sequence Models，63页ppt

专知会员服务

27+阅读 · 2019年11月10日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

【RecSys 2019报告】用于旅游业的推荐系统（Building Useful Recommender Systems for Tourists）

【RecSys 2019报告】用于旅游业的推荐系统（Building Useful Recommender Systems for Tourists）

专知会员服务

32+阅读 · 2019年9月19日

热门VIP内容

开通专知VIP会员享更多权益服务

【博士论文】低维与高维空间中潜在表征的分析、建模与变换

《生态建模密码破译：建模与编程实践》美陆军最新报告

大模型解决方案白皮书：社交陪伴场景全流程落地指南

面向具身操作的视觉-语言-动作模型综述

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

已删除

将门创投

12+阅读 · 2018年6月25日

相关论文

A Plug-and-Play Method for Controlled Text Generation

A Plug-and-Play Method for Controlled Text Generation

Arxiv

2+阅读 · 2021年9月20日

RnG-KBQA: Generation Augmented Iterative Ranking for Knowledge Base Question Answering

Arxiv

1+阅读 · 2021年9月17日

Pretrained Transformers for Text Ranking: BERT and Beyond

Arxiv

28+阅读 · 2020年10月13日

Adversarial Mutual Information for Text Generation

Adversarial Mutual Information for Text Generation

Arxiv

13+阅读 · 2020年6月30日

Few-shot Natural Language Generation for Task-Oriented Dialog

Few-shot Natural Language Generation for Task-Oriented Dialog

Arxiv

30+阅读 · 2020年2月27日

GREASE: A Generative Model for Relevance Search over Knowledge Graphs

Arxiv

4+阅读 · 2019年10月11日

Text Summarization with Pretrained Encoders

Arxiv

5+阅读 · 2019年8月22日

Graph Convolutional Networks for Text Classification

Arxiv

11+阅读 · 2018年10月17日

Learning to Focus when Ranking Answers

Learning to Focus when Ranking Answers

Arxiv

5+阅读 · 2018年8月8日

The Search Problem in Mixture Models

Arxiv

3+阅读 · 2018年2月24日

微信扫码咨询专知VIP会员