项目名称: 基于深度神经网络的自动作文评分算法研究
项目编号: No.61472391
项目类型: 面上项目
立项/批准年度: 2015
项目学科: 计算机科学学科
项目作者: 何苯
作者单位: 中国科学院大学
项目金额: 80万元
中文摘要: 在自动作文评分系统中, 对能够衡量作文水平与质量的特征的提取是保证评分准确性的关键技术手段。当前的自动作文评分算法普遍采用文章长度、语法错误等浅层特征,然而受限于目前自然语言处理技术水平,这些特征仅能在词法句法层面有效反应作文写作质量,而对于语义内容层面则仅能定制出较为浅层的特征,无法正确表示作文的上下文语义内容。申请人在前期工作探索了多种自动作文评分方法和评分模型常用特征与写作质量的相关性和泛化能力,归纳出当前自动作文评分技术因受限于所用特征的浅层性问题,导致该技术的鲁棒性和有效性受到严重制约。在此基础上,本研究拟基于深度学习技术构建新的自动作文评分算法,通过挖掘深层次的、能够有效反应文章写作质量的语义特征, 进而训练基于深度神经网络的自动作文评分模型,并在ASAP与HSK等公开中英文作文数据集上通过多重交叉检验进行性能验证评价,以期能显著提升现有评分系统的人机一致率和鲁棒性。
中文关键词: 自然语言处理;深度学习;自动作文评分
英文摘要: Automated essay scoring (AES) utilizes pre-defined features to measure the writing quality of essays. However, due to the limits of the existing natural language processing techniques, current AES systems are only capable of making use of shallow text features such as the essay length and the number of grammar errors. As a consequence, current AES systems are not able to represent the exact semantic content of essays, resulting in limited robustness and effectiveness. To this end, we have investigated the relationship between various pre-defined features and the writing quality. Based on our prior studies, this project aims to develop a novel AES algorithm based on deep neural networks (DNN) by mining deep semantic features which can effectively reflect essay writing quality. The evaluation of the essay rating model trained by the new algorithm is planned to be done by cross-validation on the ASAP and HSK public datasets, which are in English and Chinese, respectively. Our proposed approach is expected to show significantly improved effectiveness in terms of human-machine agreement and robustness in the experiments.
英文关键词: Natural language processing;Deep learning;Automated essay scoring