用于在AI软件中开发域语言模型的统一知识图服务 (A Unified Knowledge Graph Service for Developing Domain Language Models in AI Software) - 专知论文

会员服务 ·

0

语言模型化 · MoDELS · 知识 (knowledge) · 图 · entity ·

2022 年 12 月 10 日

A Unified Knowledge Graph Service for Developing Domain Language Models in AI Software

翻译：用于在AI软件中开发域语言模型的统一知识图服务

Ruiqing Ding,Xiao Han,Leye Wang

from arxiv, 12 pages

Natural Language Processing (NLP) is one of the core techniques in AI software. As AI is being applied to more and more domains, how to efficiently develop high-quality domain-specific language models becomes a critical question in AI software engineering. Existing domain-specific language model development processes mostly focus on learning a domain-specific pre-trained language model (PLM); when training the domain task-specific language model based on PLM, only a direct (and often unsatisfactory) fine-tuning strategy is adopted commonly. By enhancing the task-specific training procedure with domain knowledge graphs, we propose KnowledgeDA, a unified and low-code domain language model development service. Given domain-specific task texts input by a user, KnowledgeDA can automatically generate a domain-specific language model following three steps: (i) localize domain knowledge entities in texts via an embedding-similarity approach; (ii) generate augmented samples by retrieving replaceable domain entity pairs from two views of both knowledge graph and training data; (iii) select high-quality augmented samples for fine-tuning via confidence-based assessment. We implement a prototype of KnowledgeDA to learn language models for two domains, healthcare and software development. Experiments on five domain-specific NLP tasks verify the effectiveness and generalizability of KnowledgeDA. (Code is publicly available at https://github.com/RuiqingDing/KnowledgeDA.)

翻译：自然语言处理(NLP)是AI软件的核心技术之一。随着AI应用到越来越多的领域,如何有效开发高质量的特定域语言模型成为AI软件工程中的一个关键问题。现有的特定域语言模式开发过程主要侧重于学习一个特定域的预先培训语言模型(PLM);在培训基于PLM的具体域任务语言模型时,通常只采用直接(而且往往不能令人满意)的微调战略。通过用域知识图表加强具体任务培训程序,我们建议“知识开发工具”,一个统一和低编码的域域语言模型开发服务。鉴于用户对域特定任务文本的投入,“知识开发工具”可自动产生一个特定域语言模型,遵循三个步骤:(一) 通过嵌入式-类似方法,将文本中的域知识实体本地化;(二) 从知识图表和培训数据两种观点中重新定位可替换的域实体配对,产生更多的样本;(三) 选择高质量的强化样本,通过基于信任的评估进行微调。我们实施了知识开发数据数据库原型,以学习两个域的域、ROD/软件开发。

0

相关内容

语言模型化

语言模型化

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

【微软】大型神经语言模型的对抗性训练，Adversarial Training for Large Neural Language Models

【微软】大型神经语言模型的对抗性训练，Adversarial Training for Large Neural Language Models

专知会员服务

51+阅读 · 2020年5月3日

【论文翻译】NLP注意力机制综述论文翻译，Attention, please! A Critical Review of Neural Attention Models in Natural Language Processing

【论文翻译】NLP注意力机制综述论文翻译，Attention, please! A Critical Review of Neural Attention Models in Natural Language Processing

专知会员服务

96+阅读 · 2020年4月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

专知会员服务

43+阅读 · 2019年11月25日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

征稿 | International Joint Conference on Knowledge Graphs (IJCKG)

征稿 | International Joint Conference on Knowledge Graphs (IJCKG)

开放知识图谱

2+阅读 · 2022年5月20日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

会议交流 | IJCKG: International Joint Conference on Knowledge Graphs

会议交流 | IJCKG: International Joint Conference on Knowledge Graphs

开放知识图谱

0+阅读 · 2021年9月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

基于虚拟并发数据流和有效控制数据驱动的无线网络化系统协同设计

国家自然科学基金

0+阅读 · 2014年12月31日

弹丸的随机动态稳定性研究

国家自然科学基金

0+阅读 · 2013年12月31日

星形胶质细胞RORα在多巴胺神经元损伤中的作用及机制

国家自然科学基金

0+阅读 · 2013年12月31日

Diversin介导非小细胞肺癌长春瑞滨耐药的分子机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

Persephin在急性肾损伤中的作用研究

国家自然科学基金

0+阅读 · 2012年12月31日

IL-10调控Th17细胞在慢性HBV感染肝损伤中的作用及机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

专利许可谈判行为与决策模型研究

国家自然科学基金

0+阅读 · 2012年12月31日

TRAIL协同IER3调节NF-κB信号通路介导肝癌细胞凋亡的相关机制研究

国家自然科学基金

1+阅读 · 2012年12月31日

MCM3-SYF2复合物对cyclin D1-CDKs调节在星形胶质细胞炎症激活中的作用

国家自然科学基金

0+阅读 · 2012年12月31日

Wnt/β-catenin通路在内皮抑素抑制动脉粥样硬化斑块内微血管新生中的作用及其机制

国家自然科学基金

0+阅读 · 2011年12月31日

CatAlyst: Domain-Extensible Intervention for Preventing Task Procrastination Using Large Generative Models

Arxiv

0+阅读 · 2023年2月11日

Adversarial Transformer Language Models for Contextual Commonsense Inference

Arxiv

0+阅读 · 2023年2月10日

Translating Natural Language to Planning Goals with Large-Language Models

Arxiv

1+阅读 · 2023年2月10日

Adversarial Robustness of Representation Learning for Knowledge Graphs

Arxiv

10+阅读 · 2022年9月30日

K-AID: Enhancing Pre-trained Language Models with Domain Knowledge for Question Answering

Arxiv

15+阅读 · 2021年9月22日

Unsupervised Domain Clusters in Pretrained Language Models

Arxiv

11+阅读 · 2020年4月5日

Knowledge Graph Transfer Network for Few-Shot Recognition

Arxiv

15+阅读 · 2019年11月21日

Explainable Reasoning over Knowledge Graphs for Recommendation

Arxiv

11+阅读 · 2018年11月12日

Which Knowledge Graph Is Best for Me?

Arxiv

11+阅读 · 2018年9月28日

Strong Baselines for Simple Question Answering over Knowledge Graphs with and without Neural Networks

Arxiv

17+阅读 · 2018年6月5日

VIP会员

文章信息

相关主题

语言模型化

知识 (knowledge)

相关VIP内容

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

【微软】大型神经语言模型的对抗性训练，Adversarial Training for Large Neural Language Models

【微软】大型神经语言模型的对抗性训练，Adversarial Training for Large Neural Language Models

专知会员服务

51+阅读 · 2020年5月3日

【论文翻译】NLP注意力机制综述论文翻译，Attention, please! A Critical Review of Neural Attention Models in Natural Language Processing

【论文翻译】NLP注意力机制综述论文翻译，Attention, please! A Critical Review of Neural Attention Models in Natural Language Processing

专知会员服务

96+阅读 · 2020年4月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

专知会员服务

43+阅读 · 2019年11月25日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

热门VIP内容

开通专知VIP会员享更多权益服务

《物联网（IoT）中的无人机通信高效控制》135页

《在GNSS信号降级环境中利用共识实现无人机集群稳健协调》

中程单向攻击无人机的战略意义：俄乌战争启示

《面向无人机集群的避障动态传感器覆盖算法》最新38页

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

征稿 | International Joint Conference on Knowledge Graphs (IJCKG)

征稿 | International Joint Conference on Knowledge Graphs (IJCKG)

开放知识图谱

2+阅读 · 2022年5月20日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

会议交流 | IJCKG: International Joint Conference on Knowledge Graphs

会议交流 | IJCKG: International Joint Conference on Knowledge Graphs

开放知识图谱

0+阅读 · 2021年9月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

CatAlyst: Domain-Extensible Intervention for Preventing Task Procrastination Using Large Generative Models

Arxiv

0+阅读 · 2023年2月11日

Adversarial Transformer Language Models for Contextual Commonsense Inference

Arxiv

0+阅读 · 2023年2月10日

Translating Natural Language to Planning Goals with Large-Language Models

Arxiv

1+阅读 · 2023年2月10日

Adversarial Robustness of Representation Learning for Knowledge Graphs

Arxiv

10+阅读 · 2022年9月30日

K-AID: Enhancing Pre-trained Language Models with Domain Knowledge for Question Answering

Arxiv

15+阅读 · 2021年9月22日

Unsupervised Domain Clusters in Pretrained Language Models

Arxiv

11+阅读 · 2020年4月5日

Knowledge Graph Transfer Network for Few-Shot Recognition

Arxiv

15+阅读 · 2019年11月21日

Explainable Reasoning over Knowledge Graphs for Recommendation

Arxiv

11+阅读 · 2018年11月12日

Which Knowledge Graph Is Best for Me?

Arxiv

11+阅读 · 2018年9月28日

Strong Baselines for Simple Question Answering over Knowledge Graphs with and without Neural Networks

Arxiv

17+阅读 · 2018年6月5日

相关基金

基于虚拟并发数据流和有效控制数据驱动的无线网络化系统协同设计

国家自然科学基金

0+阅读 · 2014年12月31日

弹丸的随机动态稳定性研究

国家自然科学基金

0+阅读 · 2013年12月31日

星形胶质细胞RORα在多巴胺神经元损伤中的作用及机制

国家自然科学基金

0+阅读 · 2013年12月31日

Diversin介导非小细胞肺癌长春瑞滨耐药的分子机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

Persephin在急性肾损伤中的作用研究

国家自然科学基金

0+阅读 · 2012年12月31日

IL-10调控Th17细胞在慢性HBV感染肝损伤中的作用及机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

专利许可谈判行为与决策模型研究

国家自然科学基金

0+阅读 · 2012年12月31日

TRAIL协同IER3调节NF-κB信号通路介导肝癌细胞凋亡的相关机制研究

国家自然科学基金

1+阅读 · 2012年12月31日

MCM3-SYF2复合物对cyclin D1-CDKs调节在星形胶质细胞炎症激活中的作用

国家自然科学基金

0+阅读 · 2012年12月31日

Wnt/β-catenin通路在内皮抑素抑制动脉粥样硬化斑块内微血管新生中的作用及其机制

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员