深入研究中华文字的内部结构 (An In-depth Study on Internal Structure of Chinese Words) - 专知论文

会员服务 ·

0

树库 · 句法分析 · Better · SimPLe · 相互独立的 ·

2021 年 6 月 1 日

An In-depth Study on Internal Structure of Chinese Words

翻译：深入研究中华文字的内部结构

Chen Gong,Saihao Huang,Houquan Zhou,Zhenghua Li,Min Zhang,Zhefeng Wang,Baoxing Huai,Nicholas Jing Yuan

from arxiv, Accepted by ACL-IJCNLP 2021 (long paper)

Unlike English letters, Chinese characters have rich and specific meanings. Usually, the meaning of a word can be derived from its constituent characters in some way. Several previous works on syntactic parsing propose to annotate shallow word-internal structures for better utilizing character-level information. This work proposes to model the deep internal structures of Chinese words as dependency trees with 11 labels for distinguishing syntactic relationships. First, based on newly compiled annotation guidelines, we manually annotate a word-internal structure treebank (WIST) consisting of over 30K multi-char words from Chinese Penn Treebank. To guarantee quality, each word is independently annotated by two annotators and inconsistencies are handled by a third senior annotator. Second, we present detailed and interesting analysis on WIST to reveal insights on Chinese word formation. Third, we propose word-internal structure parsing as a new task, and conduct benchmark experiments using a competitive dependency parser. Finally, we present two simple ways to encode word-internal structures, leading to promising gains on the sentence-level syntactic parsing task.

翻译：与英文字母不同, 中文字符具有丰富和具体的含义。通常, 单词的含义可以以某种方式从其组成字符中得出。先前的几项合成分析提议对浅单词内部结构进行批注, 以便更好地利用字符级信息。这项工作建议将中国单词的深层内部结构建模为依赖树, 并配有11个标签, 以区分合成关系。首先, 根据新编的批注指南, 我们手写一个单词内部结构树库( WIST), 由中国彭树银行的30K多个字组成。为了保证质量, 每个单词都由两个注解者独立附加说明, 由第三个高级批注者处理不一致之处。其次, 我们对WIST进行详细和有趣的分析, 以揭示对中文单词构成的洞见。第三, 我们提出单词内部结构的分类是一项新任务, 并使用竞争性的依赖分析器进行基准实验。最后, 我们提出了两种简单的词内部结构编码方法, 从而有望在句级合成拼写任务上取得成果。

0

相关内容

【Manning新书】C++并行实战，592页pdf，C++ Concurrency in Action

【Manning新书】C++并行实战，592页pdf，C++ Concurrency in Action

专知会员服务

63+阅读 · 2021年1月16日

商业数据分析，39页ppt

商业数据分析，39页ppt

专知会员服务

165+阅读 · 2020年6月2日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

111+阅读 · 2020年5月15日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

人工智能 | CCF推荐期刊专刊约稿信息6条

人工智能 | CCF推荐期刊专刊约稿信息6条

Call4Papers

5+阅读 · 2019年2月18日

已删除

将门创投

6+阅读 · 2019年1月11日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

CCF B类期刊IPM专刊截稿信息1条

CCF B类期刊IPM专刊截稿信息1条

Call4Papers

3+阅读 · 2018年10月11日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Rethinking Hard-Parameter Sharing in Multi-Task Learning

Arxiv

1+阅读 · 2021年7月23日

Simplify the Usage of Lexicon in Chinese NER

Arxiv

5+阅读 · 2020年10月14日

Graph Neural Networks: A Review of Methods and Applications

Graph Neural Networks: A Review of Methods and Applications

Arxiv

9+阅读 · 2019年3月7日

code2seq: Generating Sequences from Structured Representations of Code

code2seq: Generating Sequences from Structured Representations of Code

Arxiv

3+阅读 · 2019年2月6日

One for All: Neural Joint Modeling of Entities and Events

Arxiv

11+阅读 · 2018年12月1日

Jointly Learning to Label Sentences and Tokens

Arxiv

3+阅读 · 2018年11月14日

Learning with Interpretable Structure from RNN

Arxiv

19+阅读 · 2018年10月25日

Super Characters: A Conversion from Sentiment Classification to Image Classification

Arxiv

4+阅读 · 2018年10月15日

Multimodal Sentiment Analysis To Explore the Structure of Emotions

Arxiv

19+阅读 · 2018年5月25日

Chinese NER Using Lattice LSTM

Arxiv

14+阅读 · 2018年5月15日

VIP会员

文章信息

相关主题

相互独立的

相关VIP内容

【Manning新书】C++并行实战，592页pdf，C++ Concurrency in Action

【Manning新书】C++并行实战，592页pdf，C++ Concurrency in Action

专知会员服务

63+阅读 · 2021年1月16日

商业数据分析，39页ppt

商业数据分析，39页ppt

专知会员服务

165+阅读 · 2020年6月2日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

111+阅读 · 2020年5月15日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

热门VIP内容

开通专知VIP会员享更多权益服务

【伯克利博士论文】通过真实世界实践赋能机器人自主性

军用无人机集群技术尚未成熟——但潜力可期

人工智能安全治理白皮书（2025）

AgentOps综述：分类、挑战与未来方向

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

人工智能 | CCF推荐期刊专刊约稿信息6条

人工智能 | CCF推荐期刊专刊约稿信息6条

Call4Papers

5+阅读 · 2019年2月18日

已删除

将门创投

6+阅读 · 2019年1月11日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

CCF B类期刊IPM专刊截稿信息1条

CCF B类期刊IPM专刊截稿信息1条

Call4Papers

3+阅读 · 2018年10月11日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

Rethinking Hard-Parameter Sharing in Multi-Task Learning

Arxiv

1+阅读 · 2021年7月23日

Simplify the Usage of Lexicon in Chinese NER

Arxiv

5+阅读 · 2020年10月14日

Graph Neural Networks: A Review of Methods and Applications

Graph Neural Networks: A Review of Methods and Applications

Arxiv

9+阅读 · 2019年3月7日

code2seq: Generating Sequences from Structured Representations of Code

code2seq: Generating Sequences from Structured Representations of Code

Arxiv

3+阅读 · 2019年2月6日

One for All: Neural Joint Modeling of Entities and Events

Arxiv

11+阅读 · 2018年12月1日

Jointly Learning to Label Sentences and Tokens

Arxiv

3+阅读 · 2018年11月14日

Learning with Interpretable Structure from RNN

Arxiv

19+阅读 · 2018年10月25日

Super Characters: A Conversion from Sentiment Classification to Image Classification

Arxiv

4+阅读 · 2018年10月15日

Multimodal Sentiment Analysis To Explore the Structure of Emotions

Arxiv

19+阅读 · 2018年5月25日

Chinese NER Using Lattice LSTM

Arxiv

14+阅读 · 2018年5月15日

微信扫码咨询专知VIP会员