中文文字简化 (Chinese Lexical Simplification) - 专知论文

会员服务 ·

0

基准 · 词表 · Processing（编程语言） · 置换 · 注意力机制 ·

2020 年 10 月 14 日

Chinese Lexical Simplification

翻译：中文文字简化

Jipeng Qiang,Xinyu Lu,Yun Li,Yunhao Yuan,Yang Shi,Xindong Wu

Lexical simplification has attracted much attention in many languages, which is the process of replacing complex words in a given sentence with simpler alternatives of equivalent meaning. Although the richness of vocabulary in Chinese makes the text very difficult to read for children and non-native speakers, there is no research work for Chinese lexical simplification (CLS) task. To circumvent difficulties in acquiring annotations, we manually create the first benchmark dataset for CLS, which can be used for evaluating the lexical simplification systems automatically. In order to acquire more thorough comparison, we present five different types of methods as baselines to generate substitute candidates for the complex word that include synonym-based approach, word embedding-based approach, pretrained language model-based approach, sememe-based approach, and a hybrid approach. Finally, we design the experimental evaluation of these baselines and discuss their advantages and disadvantages. To our best knowledge, this is the first study for CLS task.

翻译：在许多语言中,法律简化引起了人们的极大注意,这是用更简单的替代语言取代某一句中复杂词语的过程。虽然中文词汇丰富,使得儿童和非母语发言人很难读到文字,但中国法律简化(CLS)任务没有研究工作。为避免获取注释方面的困难,我们手工为CLS创建了第一个基准数据集,可用于自动评估法律简化系统。为了进行更彻底的比较,我们提出了五种不同类型的方法作为基准,为复杂的词产生替代对象,该词包括同义词法、以词嵌入法为基础的方法、预先培训的语言模式方法、基于语言的方法和混合方法。最后,我们设计了对这些基线的实验性评估,并讨论了这些基线的利弊。据我们所知,这是CLS任务的首项研究。

0

相关内容

最新《文本简化》综述论文，26页pdf，A Survey on Text Simplification

最新《文本简化》综述论文，26页pdf，A Survey on Text Simplification

专知会员服务

15+阅读 · 2020年8月26日

【机器学习术语宝典】机器学习中英文术语表

【机器学习术语宝典】机器学习中英文术语表

专知会员服务

61+阅读 · 2020年7月12日

KG-BERT：基于BERT的知识图谱补全，KG-BERT: BERT for Knowledge Graph Completion

KG-BERT：基于BERT的知识图谱补全，KG-BERT: BERT for Knowledge Graph Completion

专知会员服务

195+阅读 · 2020年5月31日

【知识图谱嵌入补全综述论文】embedding models for knowledge base completion

【知识图谱嵌入补全综述论文】embedding models for knowledge base completion

专知会员服务

102+阅读 · 2020年4月25日

【技术报告】诺亚开源中文预训练语言模型“哪吒”（NEZHA: Neural Contextualized Representation for Chinese Language Understanding）

【技术报告】诺亚开源中文预训练语言模型“哪吒”（NEZHA: Neural Contextualized Representation for Chinese Language Understanding）

专知会员服务

21+阅读 · 2019年12月12日

【NAACL 2019 workshop】相似语言、变体和方言自然语言处理 The workshop on NLP for Similar Languages, Varieties and Dialects，约翰斯·霍普金斯大学|David Yarowsky

【NAACL 2019 workshop】相似语言、变体和方言自然语言处理 The workshop on NLP for Similar Languages, Varieties and Dialects，约翰斯·霍普金斯大学|David Yarowsky

专知会员服务

5+阅读 · 2019年12月5日

【ACL 2019 Tutorials】深度贝叶斯自然语言处理（Deep Bayesian Natural Language Processing），Jen-Tzung Chien

【ACL 2019 Tutorials】深度贝叶斯自然语言处理（Deep Bayesian Natural Language Processing），Jen-Tzung Chien

专知会员服务

48+阅读 · 2019年11月17日

面向大数据的粒计算理论与方法，山西大学梁吉业教授，第八届全国社会媒体处理大会SMP2019

面向大数据的粒计算理论与方法，山西大学梁吉业教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

40+阅读 · 2019年10月22日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【新书】Python编程基础，669页pdf

【新书】Python编程基础，669页pdf

专知会员服务

197+阅读 · 2019年10月10日

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

【Github】GPT2-Chinese：中文的GPT2训练代码

【Github】GPT2-Chinese：中文的GPT2训练代码

AINLP

52+阅读 · 2019年8月23日

弱监督语义分割最新方法资源列表

弱监督语义分割最新方法资源列表

专知

9+阅读 · 2019年2月26日

上百种预训练中文词向量：Chinese-Word-Vectors

上百种预训练中文词向量：Chinese-Word-Vectors

AINLP

23+阅读 · 2019年2月26日

【论文推荐】最新九篇机器翻译相关论文—深度多任务学习、深度RNNs、注意焦点、多源神经机器翻译

【论文推荐】最新九篇机器翻译相关论文—深度多任务学习、深度RNNs、注意焦点、多源神经机器翻译

专知

8+阅读 · 2018年6月21日

【论文推荐】最新十篇机器翻译相关论文—自然语言推理、无监督神经机器翻译、多任务学习、局部卷积、图卷积、多语种机器翻译

【论文推荐】最新十篇机器翻译相关论文—自然语言推理、无监督神经机器翻译、多任务学习、局部卷积、图卷积、多语种机器翻译

专知

15+阅读 · 2018年5月1日

【论文推荐】最新五篇命名实体识别（NER）相关论文—对抗学习、语料库、深度多任务学习、先验知识、跨语言语义

【论文推荐】最新五篇命名实体识别（NER）相关论文—对抗学习、语料库、深度多任务学习、先验知识、跨语言语义

专知

37+阅读 · 2018年2月21日

【论文推荐】最新5篇信息抽取（IE）相关论文—开放信息抽取、不完整信息、主动学习、越南语、依存分析

【论文推荐】最新5篇信息抽取（IE）相关论文—开放信息抽取、不完整信息、主动学习、越南语、依存分析

专知

12+阅读 · 2018年2月2日

carla无人驾驶模拟中文项目 carla_simulator_Chinese

carla无人驾驶模拟中文项目 carla_simulator_Chinese

CreateAMind

3+阅读 · 2018年1月30日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

Meta-Embeddings for Natural Language Inference and Semantic Similarity tasks

Meta-Embeddings for Natural Language Inference and Semantic Similarity tasks

Arxiv

0+阅读 · 2020年12月1日

Why Not Simply Translate? A First Swedish Evaluation Benchmark for Semantic Similarity

Arxiv

0+阅读 · 2020年11月29日

Text-to-Image Synthesis Based on Machine Generated Captions

Text-to-Image Synthesis Based on Machine Generated Captions

Arxiv

3+阅读 · 2019年10月9日

A Simple BERT-Based Approach for Lexical Simplification

A Simple BERT-Based Approach for Lexical Simplification

Arxiv

6+阅读 · 2019年7月16日

CAN-NER: Convolutional Attention Network for Chinese Named Entity Recognition

CAN-NER: Convolutional Attention Network for Chinese Named Entity Recognition

Arxiv

6+阅读 · 2019年4月30日

A Unified Model for Joint Chinese Word Segmentation and Dependency Parsing

Arxiv

4+阅读 · 2019年4月9日

CAN-NER: Convolutional Attention Network forChinese Named Entity Recognition

Arxiv

16+阅读 · 2019年4月3日

Simple and Effective Text Simplification Using Semantic and Neural Methods

Arxiv

3+阅读 · 2018年10月11日

Chinese NER Using Lattice LSTM

Arxiv

5+阅读 · 2018年5月5日

Adversarial Learning for Chinese NER from Crowd Annotations

Arxiv

15+阅读 · 2018年1月16日

VIP会员

文章信息

相关主题

Processing（编程语言）

注意力机制

相关VIP内容

最新《文本简化》综述论文，26页pdf，A Survey on Text Simplification

最新《文本简化》综述论文，26页pdf，A Survey on Text Simplification

专知会员服务

15+阅读 · 2020年8月26日

【机器学习术语宝典】机器学习中英文术语表

【机器学习术语宝典】机器学习中英文术语表

专知会员服务

61+阅读 · 2020年7月12日

KG-BERT：基于BERT的知识图谱补全，KG-BERT: BERT for Knowledge Graph Completion

KG-BERT：基于BERT的知识图谱补全，KG-BERT: BERT for Knowledge Graph Completion

专知会员服务

195+阅读 · 2020年5月31日

【知识图谱嵌入补全综述论文】embedding models for knowledge base completion

【知识图谱嵌入补全综述论文】embedding models for knowledge base completion

专知会员服务

102+阅读 · 2020年4月25日

【技术报告】诺亚开源中文预训练语言模型“哪吒”（NEZHA: Neural Contextualized Representation for Chinese Language Understanding）

【技术报告】诺亚开源中文预训练语言模型“哪吒”（NEZHA: Neural Contextualized Representation for Chinese Language Understanding）

专知会员服务

21+阅读 · 2019年12月12日

【NAACL 2019 workshop】相似语言、变体和方言自然语言处理 The workshop on NLP for Similar Languages, Varieties and Dialects，约翰斯·霍普金斯大学|David Yarowsky

【NAACL 2019 workshop】相似语言、变体和方言自然语言处理 The workshop on NLP for Similar Languages, Varieties and Dialects，约翰斯·霍普金斯大学|David Yarowsky

专知会员服务

5+阅读 · 2019年12月5日

【ACL 2019 Tutorials】深度贝叶斯自然语言处理（Deep Bayesian Natural Language Processing），Jen-Tzung Chien

【ACL 2019 Tutorials】深度贝叶斯自然语言处理（Deep Bayesian Natural Language Processing），Jen-Tzung Chien

专知会员服务

48+阅读 · 2019年11月17日

面向大数据的粒计算理论与方法，山西大学梁吉业教授，第八届全国社会媒体处理大会SMP2019

面向大数据的粒计算理论与方法，山西大学梁吉业教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

40+阅读 · 2019年10月22日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【新书】Python编程基础，669页pdf

【新书】Python编程基础，669页pdf

专知会员服务

197+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

【ICCV2025教程】基础模型遇见具身智能体

军事机器学习设计：关于开发自动化任务摘要系统的梯次化设计科学研究 | 2025最新93页

扩散模型中的缓存方法综述：迈向高效的多模态生成

【ICCV2025教程】《迈向视觉语言模型的全面推理》

相关资讯

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

【Github】GPT2-Chinese：中文的GPT2训练代码

【Github】GPT2-Chinese：中文的GPT2训练代码

AINLP

52+阅读 · 2019年8月23日

弱监督语义分割最新方法资源列表

弱监督语义分割最新方法资源列表

专知

9+阅读 · 2019年2月26日

上百种预训练中文词向量：Chinese-Word-Vectors

上百种预训练中文词向量：Chinese-Word-Vectors

AINLP

23+阅读 · 2019年2月26日

【论文推荐】最新九篇机器翻译相关论文—深度多任务学习、深度RNNs、注意焦点、多源神经机器翻译

【论文推荐】最新九篇机器翻译相关论文—深度多任务学习、深度RNNs、注意焦点、多源神经机器翻译

专知

8+阅读 · 2018年6月21日

【论文推荐】最新十篇机器翻译相关论文—自然语言推理、无监督神经机器翻译、多任务学习、局部卷积、图卷积、多语种机器翻译

【论文推荐】最新十篇机器翻译相关论文—自然语言推理、无监督神经机器翻译、多任务学习、局部卷积、图卷积、多语种机器翻译

专知

15+阅读 · 2018年5月1日

【论文推荐】最新五篇命名实体识别（NER）相关论文—对抗学习、语料库、深度多任务学习、先验知识、跨语言语义

【论文推荐】最新五篇命名实体识别（NER）相关论文—对抗学习、语料库、深度多任务学习、先验知识、跨语言语义

专知

37+阅读 · 2018年2月21日

【论文推荐】最新5篇信息抽取（IE）相关论文—开放信息抽取、不完整信息、主动学习、越南语、依存分析

【论文推荐】最新5篇信息抽取（IE）相关论文—开放信息抽取、不完整信息、主动学习、越南语、依存分析

专知

12+阅读 · 2018年2月2日

carla无人驾驶模拟中文项目 carla_simulator_Chinese

carla无人驾驶模拟中文项目 carla_simulator_Chinese

CreateAMind

3+阅读 · 2018年1月30日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

相关论文

Meta-Embeddings for Natural Language Inference and Semantic Similarity tasks

Meta-Embeddings for Natural Language Inference and Semantic Similarity tasks

Arxiv

0+阅读 · 2020年12月1日

Why Not Simply Translate? A First Swedish Evaluation Benchmark for Semantic Similarity

Arxiv

0+阅读 · 2020年11月29日

Text-to-Image Synthesis Based on Machine Generated Captions

Text-to-Image Synthesis Based on Machine Generated Captions

Arxiv

3+阅读 · 2019年10月9日

A Simple BERT-Based Approach for Lexical Simplification

A Simple BERT-Based Approach for Lexical Simplification

Arxiv

6+阅读 · 2019年7月16日

CAN-NER: Convolutional Attention Network for Chinese Named Entity Recognition

CAN-NER: Convolutional Attention Network for Chinese Named Entity Recognition

Arxiv

6+阅读 · 2019年4月30日

A Unified Model for Joint Chinese Word Segmentation and Dependency Parsing

Arxiv

4+阅读 · 2019年4月9日

CAN-NER: Convolutional Attention Network forChinese Named Entity Recognition

Arxiv

16+阅读 · 2019年4月3日

Simple and Effective Text Simplification Using Semantic and Neural Methods

Arxiv

3+阅读 · 2018年10月11日

Chinese NER Using Lattice LSTM

Arxiv

5+阅读 · 2018年5月5日

Adversarial Learning for Chinese NER from Crowd Annotations

Arxiv

15+阅读 · 2018年1月16日

微信扫码咨询专知VIP会员