Evaluation in natural language processing guides and promotes research on models and methods. In recent years, new evalua-tion data sets and evaluation tasks have been continuously proposed. At the same time, a series of problems exposed by ex-isting evaluation have also restricted the progress of natural language processing technology. Starting from the concept, com-position, development and meaning of natural language evaluation, this article classifies and summarizes the tasks and char-acteristics of mainstream natural language evaluation, and then summarizes the problems and causes of natural language pro-cessing evaluation. Finally, this article refers to the human language ability evaluation standard, puts forward the concept of human-like machine language ability evaluation, and proposes a series of basic principles and implementation ideas for hu-man-like machine language ability evaluation from the three aspects of reliability, difficulty and validity.
翻译:在自然语文处理指南中进行评价,并促进对模式和方法的研究。近年来,不断提出新的电子估价数据集和评价任务。同时,前评价暴露的一系列问题也限制了自然语文处理技术的进展。从自然语文评价的概念、组合、发展和含义出发,本条对主流自然语文评价的任务和特点进行分类和总结,然后总结自然语文前评价的问题和原因。最后,本条提到人文能力评价标准,提出类似人文机器语文能力评价的概念,从可靠性、难度和有效性三个方面提出人文机器语文能力评价的一系列基本原则和执行构想。