As digitized traditional cultural heritage documents have rapidly increased, resulting in an increased need for preservation and management, practical recognition of entities and typification of their classes has become essential. To achieve this, we propose KoCHET - a Korean cultural heritage corpus for the typical entity-related tasks, i.e., named entity recognition (NER), relation extraction (RE), and entity typing (ET). Advised by cultural heritage experts based on the data construction guidelines of government-affiliated organizations, KoCHET consists of respectively 112,362, 38,765, 113,198 examples for NER, RE, and ET tasks, covering all entity types related to Korean cultural heritage. Moreover, unlike the existing public corpora, modified redistribution can be allowed both domestic and foreign researchers. Our experimental results make the practical usability of KoCHET more valuable in terms of cultural heritage. We also provide practical insights of KoCHET in terms of statistical and linguistic analysis. Our corpus is freely available at https://github.com/Gyeongmin47/KoCHET.
翻译:由于数字化的传统文化遗产文件迅速增加,因而越来越需要保存和管理,对各实体的实际承认和对各实体的分类的定性变得至关重要,为此,我们提议韩国KCHET -- -- 韩国文化遗产资料库,用于典型的实体相关任务,即名称实体识别(NER)、关系提取(RE)和实体打字(ET) -- -- 由文化遗产专家根据政府附属组织的数据编制准则提供咨询,KoCHET由112,362,38765,113,198个实例组成,分别用于与韩国文化遗产有关的所有实体类型、RE和ET任务,此外,与现有的公共公司不同,可允许国内和外国研究人员修改再分配,我们的实验结果使韩国文化遗产资料库的实际可用性在文化遗产方面更为宝贵,我们还在统计和语言分析方面为KoCHET提供了实际的见解,我们的档案可在https://github.com/Gyengmin47/KOCHET免费查阅。