Creating datasets manually by human annotators is a laborious task that can lead to biased and inhomogeneous labels. We propose a flexible, semi-automatic framework for labeling data for relation extraction. Furthermore, we provide a dataset of preprocessed sentences from the requirements engineering domain, including a set of automatically created as well as hand-crafted labels. In our case study, we compare the human and automatic labels and show that there is a substantial overlap between both annotations.
翻译:由人类说明员手工创建数据集是一项艰巨的任务,可能导致有偏向和不相容的标签。我们建议为关系提取数据标签建立一个灵活、半自动的框架。此外,我们提供一套要求工程领域预处理的句子数据集,包括一套自动创建和手工制作的标签。在我们的案例研究中,我们比较了人与自动标签,并表明两个说明之间有很大的重叠。