We introduce Korean Language Understanding Evaluation (KLUE) benchmark. KLUE is a collection of 8 Korean natural language understanding (NLU) tasks, including Topic Classification, SemanticTextual Similarity, Natural Language Inference, Named Entity Recognition, Relation Extraction, Dependency Parsing, Machine Reading Comprehension, and Dialogue State Tracking. We build all of the tasks from scratch from diverse source corpora while respecting copyrights, to ensure accessibility for anyone without any restrictions. With ethical considerations in mind, we carefully design annotation protocols. Along with the benchmark tasks and data, we provide suitable evaluation metrics and fine-tuning recipes for pretrained language models for each task. We furthermore release the pretrained language models (PLM), KLUE-BERT and KLUE-RoBERTa, to help reproducing baseline models on KLUE and thereby facilitate future research. We make a few interesting observations from the preliminary experiments using the proposed KLUE benchmark suite, already demonstrating the usefulness of this new benchmark suite. First, we find KLUE-RoBERTa-large outperforms other baselines, including multilingual PLMs and existing open-source Korean PLMs. Second, we see minimal degradation in performance even when we replace personally identifiable information from the pretraining corpus, suggesting that privacy and NLU capability are not at odds with each other. Lastly, we find that using BPE tokenization in combination with morpheme-level pre-tokenization is effective in tasks involving morpheme-level tagging, detection and generation. In addition to accelerating Korean NLP research, our comprehensive documentation on creating KLUE will facilitate creating similar resources for other languages in the future. KLUE is available at https://klue-benchmark.com.
翻译:我们引入韩国语言理解评估(KLUE) 基准。 KLUE 是韩国8种自然语言理解(NLU)任务集, 包括主题分类、 语义相似性、 自然语言推断、 命名实体识别、 关系提取、 依赖性剖析、 机器读读理解 和对话状态跟踪。 我们从零开始从不同来源创建所有任务, 同时尊重版权, 确保任何人不受任何限制。 我们仔细设计了8种韩国自然语言理解(NLU) 协议。 除了基准任务和数据外, 我们为每项任务提供了合适的评估指标和微调配方。 我们还发布了预先培训的语言模型( PLMM ), KLUE- BERT 和 KLUE- BERTA, 帮助复制关于KLUE的基线模型, 从而便利未来研究。 我们从拟议的 KLUE- 基准组合基准套中找到一些有趣的观察意见, 已经展示了这个新的基准套件的有用性。 首先, 我们发现 KLUE- VERTA Girmas 超越了我们目前最起码的模型, 使用最起码的模型, 使用最起码的模型, 使用我们目前最起码的版本的版本的版本的版本的模型。