【导读】自然语言生成在聊天机器人,故事生成和数据描述中具有广泛的应用领域。 涉及自然语言生成过程的技术种类繁多。 本文提供了提供自然语言生成应用程序和技术的各种项目,工具,论文和学习资料的链接。内容包括数据集、对话系统、评估、语法、论文、视频等。
原文链接:
https://github.com/tokenmill/awesome-nlg
数据集
E2E:一个聚焦于端到端数据驱动的自然语言生成方法的数据集。
http://www.macs.hw.ac.uk/InteractionLab/E2E
Neural-Wikipedian :这个数据集包括了
https://github.com/pvougiou/Neural-Wikipedian
WebNLG:在INLG2018 的论文"Enriching the WebNLG corpus" 中被提到
https://github.com/ThiagoCF05/webnlg
Yelp:餐馆评价的自然语言资源提供
https://nlds.soe.ucsc.edu/yelpnlg
对话系统
Chatio:使用DSL生成用于AIChatBot,自然语言Task、命名实体识别或者文本分类的数据集
https://github.com/rodrigopivi/Chatito
RNNLG:自然语言生成应用领域的开源Benchmark
https://github.com/shawnwun/RNNLG
NNDIAL:构建端到端可训练任务驱动的对话系统模型
https://github.com/shawnwun/NNDIAL
评估
NLG-Eval:自然语言生成任务评估代码
https://github.com/Maluuba/nlg-eval
VizSeq:文本生成任务可视化工具
https://github.com/facebookresearch/vizseq
故事生成
Random Story Generator:使用自然语言生成技术,生成随机故事
https://github.com/aherriot/story-generator
Tracery:一个JavaScript的故事生成器
https://github.com/galaxykate/tracery
神经网络自然语言生成
Graph2Text
https://github.com/diegma/graph-2-text
Image Caption Generator:使用TensorFlow实现,
https://github.com/neural-nuts/image-caption-generator
PPLM:Plug and Play 语言生成模型
https://github.com/uber-research/PPLM
textgenrnn:可利用少量代码快速训练任何大小与复杂度的文本生成神经网络
https://github.com/minimaxir/textgenrnn
Transformers:Tensorflow2.0与Pytorch实现的最佳自然语言处理模型
https://github.com/huggingface/transformers
Summary Generation From Structured Data :利用结构化数据集生成自然语言
https://github.com/akanimax/natural-language-summary-generation-from-structured-data
论文
A Closer Look at Recent Results of Verb Selection for Data-to-Text NLG
https://www.inlg2019.com/assets/papers/178_Paper.pdf
A Personalized Data-to-Text Support Tool for Cancer Patients
https://www.inlg2019.com/assets/papers/28_Paper.pdf
Controlling Contents in Data-to-Document Generation withHuman-Designed Topic Labels
https://www.inlg2019.com/assets/papers/79_Paper.pdf
Hotel Scribe: Generating High Variation Hotel Descriptions
https://www.inlg2019.com/assets/papers/44_Paper.pdf
Natural Language Generation enhances human decision-making withuncertain information
https://arxiv.org/pdf/1606.03254.pdf
NLP - Text Generation Reading List
https://github.com/zhongpeixiang/AI-NLP-Paper-Readings/blob/master/NLP/NLP_generation.md
Survey of the State of the Art in NaturalLanguage Generation: Core tasks, applicationsand evaluation
https://arxiv.org/pdf/1703.09902.pdf
Revisiting Challenges in Data-to-Text Generation with Fact Grounding
https://www.inlg2019.com/assets/papers/32_Paper.pdf
产品
Accelerated Text :自动生成数据的多种结构与文本的描述
https://github.com/tokenmill/accelerated-text
Twine:一个讲交互式、非线性故事的开源工具
http://twinery.org/
生产工具
Genl: 使用Tree Adjoining Grammar的API接口
https://github.com/kowey/GenI
JSrealB :一个用于Web开发的JavaScript 双语文本生成API
https://github.com/rali-udem/JSrealB
SimpleNLG:一个自然语言生成的Java API
https://github.com/simplenlg/simplenlg
SimpleNLG DE:SimpleNLG的德语版本
https://github.com/sebischair/SimpleNLG-DE
SimpleNLG-EnFr:SimpleNLG的英语/法语版本
https://github.com/rali-udem/SimpleNLG-EnFr
视频
Data-To-Text: Generating Textual Summaries of Complex Data - Ehud Reiter
https://www.youtube.com/watch?v=kFRw-wk5YOA
Natural Language Generation (Introduction)
https://www.youtube.com/watch?v=4fjM72lbJaw
Strata Data Conference | The future of natural language generation: 2017-2027
https://www.youtube.com/watch?v=Ls7elVbN8bI
The Quest for Automated Story Generation - Mark Riedl
https://www.youtube.com/watch?v=wgcDUX_BPpk
点击“阅读原文”,了解使用专知,查看5000+AI主题知识资料