CREER: 大型采掘和实体承认公司 (CREER: A Large-Scale Corpus for Relation Extraction and Entity Recognition)

We describe the design and use of the CREER dataset, a large corpus annotated with rich English grammar and semantic attributes. The CREER dataset uses the Stanford CoreNLP Annotator to capture rich language structures from Wikipedia plain text. This dataset follows widely used linguistic and semantic annotations so that it can be used for not only most natural language processing tasks but also scaling the dataset. This large supervised dataset can serve as the basis for improving the performance of NLP tasks in the future. We publicize the dataset through the link: https://140.116.82.111/share.cgi?ssid=000 dOJ4

翻译：我们描述CREER数据集的设计和使用情况,CREER数据集是具有丰富的英语语法和语义属性的大型数据集,CREER数据集使用斯坦福核心NLP说明器从维基百科纯文本中捕捉丰富的语言结构,该数据集遵循广泛使用的语言和语义说明,不仅可用于大多数自然语言处理任务,而且用于缩放数据集。这个大型受监督数据集可以作为今后改进NLP任务绩效的基础。我们通过链接公布数据集:https://140.116.82.111/share.cgi?sid=000 dOJ4。我们通过链接公布数据集:https://140.116. share.cgi?sid=000 dOJ4。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日