Vyākarana:印度语同步评估无色绿色基准 (Vyākarana: A Colorless Green Benchmark for Syntactic Evaluation in Indic Languages) - 专知论文

会员服务 ·

0

语言模型化 · MoDELS · XLM-R · NLU · 词性标注 ·

2021 年 5 月 16 日

Vyākarana: A Colorless Green Benchmark for Syntactic Evaluation in Indic Languages

翻译：Vyākarana:印度语同步评估无色绿色基准

Rajaswa Patil,Jasleen Dhillon,Siddhant Mahurkar,Saumitra Kulkarni,Manav Malhotra,Veeky Baths

from arxiv, Accepted at ACL-IJCNLP SRW 2021

While there has been significant progress towards developing NLU resources for Indic languages, syntactic evaluation has been relatively less explored. Unlike English, Indic languages have rich morphosyntax, grammatical genders, free linear word-order, and highly inflectional morphology. In this paper, we introduce Vy\=akarana: a benchmark of gender-balanced Colorless Green sentences in Indic languages for syntactic evaluation of multilingual language models. The benchmark comprises four syntax-related tasks: PoS Tagging, Syntax Tree-depth Prediction, Grammatical Case Marking, and Subject-Verb Agreement. We use the datasets from the evaluation tasks to probe five multilingual language models of varying architectures for syntax in Indic languages. Due to its prevalence, we also include a code-switching setting in our experiments. Our results show that the token-level and sentence-level representations from the Indic language models (IndicBERT and MuRIL) do not capture the syntax in Indic languages as efficiently as the other highly multilingual language models. Further, our layer-wise probing experiments reveal that while mBERT, DistilmBERT, and XLM-R localize the syntax in middle layers, the Indic language models do not show such syntactic localization.

翻译：虽然在开发印度语的NLU资源方面取得了显著进展,但合成评价的探索却相对较少。与英语不同,印度语与英语不同,印度语具有丰富的形态学、语法性别、自由线性文字顺序和高度纵向形态学。在本文中,我们引入了Vyakarana:性别平衡的印度语无色绿色句的基准,用于多语模式的综合评估。基准包括四项与语法有关的任务:Pos taxing、语系深入预测、格莱玛特案例标记和主题-Verb协议。我们使用评估任务中的数据集来探测五种不同语系结构的多种语言模式,用于印度语的合成。由于其普遍性,我们还在我们的实验中包括了一个代码转换设置。我们的结果显示,印度语模式(IndicBERT和MuriLL)的象征性级别和判决级表达方式并没有将印化语言的语系合成语言作为其他高度多语种语言模式有效体现。此外,我们使用评价任务的数据集测试中层B 展示了其他高语言模式。

0

相关内容

语言模型化

语言模型化

【干货书】Python简洁代码第二版，422页pdf，Clean Code in Python, 2nd Edition

【干货书】Python简洁代码第二版，422页pdf，Clean Code in Python, 2nd Edition

专知会员服务

37+阅读 · 2021年1月15日

【2020新书】Python文本分析，104页pdf

【2020新书】Python文本分析，104页pdf

专知会员服务

100+阅读 · 2020年12月23日

20篇「ACL2020」最新论文抢先看！看自然语言处理2020在研究什么？

20篇「ACL2020」最新论文抢先看！看自然语言处理2020在研究什么？

专知会员服务

97+阅读 · 2020年4月10日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

统计学习方法第一版课程PPT

统计学习方法第一版课程PPT

AINLP

13+阅读 · 2019年5月14日

ICLR2019最佳论文出炉

ICLR2019最佳论文出炉

专知

12+阅读 · 2019年5月6日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

上百种预训练中文词向量：Chinese-Word-Vectors

上百种预训练中文词向量：Chinese-Word-Vectors

AINLP

23+阅读 · 2019年2月26日

下载 | 超全机器学习思维导图

下载 | 超全机器学习思维导图

机器学习算法与Python学习

23+阅读 · 2019年1月17日

Facebook PyText 在 Github 上开源了

Facebook PyText 在 Github 上开源了

AINLP

7+阅读 · 2018年12月14日

【论文推荐】最新十篇机器翻译相关论文—自然语言推理、无监督神经机器翻译、多任务学习、局部卷积、图卷积、多语种机器翻译

【论文推荐】最新十篇机器翻译相关论文—自然语言推理、无监督神经机器翻译、多任务学习、局部卷积、图卷积、多语种机器翻译

专知

15+阅读 · 2018年5月1日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

已删除

将门创投

4+阅读 · 2018年1月19日

最佳实践：深度学习用于自然语言处理（三）

最佳实践：深度学习用于自然语言处理（三）

待字闺中

3+阅读 · 2017年8月20日

Multilingual Speech Evaluation: Case Studies on English, Malay and Tamil

Multilingual Speech Evaluation: Case Studies on English, Malay and Tamil

Arxiv

0+阅读 · 2021年7月8日

CBLUE: A Chinese Biomedical Language Understanding Evaluation Benchmark

CBLUE: A Chinese Biomedical Language Understanding Evaluation Benchmark

Arxiv

0+阅读 · 2021年7月6日

ReNAS:Relativistic Evaluation of Neural Architecture Search

Arxiv

11+阅读 · 2021年3月10日

Simplify the Usage of Lexicon in Chinese NER

Arxiv

5+阅读 · 2020年10月14日

Rethinking Positional Encoding in Language Pre-training

Arxiv

4+阅读 · 2020年7月9日

XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization

XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization

Arxiv

3+阅读 · 2020年3月24日

Improving Neural Question Generation using Answer Separation

Improving Neural Question Generation using Answer Separation

Arxiv

3+阅读 · 2018年9月7日

Semantic Parsing: Syntactic assurance to target sentence using LSTM Encoder CFG-Decoder

Semantic Parsing: Syntactic assurance to target sentence using LSTM Encoder CFG-Decoder

Arxiv

4+阅读 · 2018年7月18日

Fine-tuned Language Models for Text Classification

Arxiv

5+阅读 · 2018年1月18日

Multilingual Topic Models

Arxiv

3+阅读 · 2017年12月18日

VIP会员

文章信息

相关主题

语言模型化

相关VIP内容

【干货书】Python简洁代码第二版，422页pdf，Clean Code in Python, 2nd Edition

【干货书】Python简洁代码第二版，422页pdf，Clean Code in Python, 2nd Edition

专知会员服务

37+阅读 · 2021年1月15日

【2020新书】Python文本分析，104页pdf

【2020新书】Python文本分析，104页pdf

专知会员服务

100+阅读 · 2020年12月23日

20篇「ACL2020」最新论文抢先看！看自然语言处理2020在研究什么？

20篇「ACL2020」最新论文抢先看！看自然语言处理2020在研究什么？

专知会员服务

97+阅读 · 2020年4月10日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

《人与智能体在系统工程建模语言V2任务中的性能表现：基于用户中心化的评估方法》308页

《数据安全国家标准体系（2025版）》征求意见稿

AlphaMosaic：人工智能赋能的作战管理系统

《军事行动中通信平台的战略价值：提升战术效能与作战优势》

相关资讯

统计学习方法第一版课程PPT

统计学习方法第一版课程PPT

AINLP

13+阅读 · 2019年5月14日

ICLR2019最佳论文出炉

ICLR2019最佳论文出炉

专知

12+阅读 · 2019年5月6日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

上百种预训练中文词向量：Chinese-Word-Vectors

上百种预训练中文词向量：Chinese-Word-Vectors

AINLP

23+阅读 · 2019年2月26日

下载 | 超全机器学习思维导图

下载 | 超全机器学习思维导图

机器学习算法与Python学习

23+阅读 · 2019年1月17日

Facebook PyText 在 Github 上开源了

Facebook PyText 在 Github 上开源了

AINLP

7+阅读 · 2018年12月14日

【论文推荐】最新十篇机器翻译相关论文—自然语言推理、无监督神经机器翻译、多任务学习、局部卷积、图卷积、多语种机器翻译

【论文推荐】最新十篇机器翻译相关论文—自然语言推理、无监督神经机器翻译、多任务学习、局部卷积、图卷积、多语种机器翻译

专知

15+阅读 · 2018年5月1日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

已删除

将门创投

4+阅读 · 2018年1月19日

最佳实践：深度学习用于自然语言处理（三）

最佳实践：深度学习用于自然语言处理（三）

待字闺中

3+阅读 · 2017年8月20日

相关论文

Multilingual Speech Evaluation: Case Studies on English, Malay and Tamil

Multilingual Speech Evaluation: Case Studies on English, Malay and Tamil

Arxiv

0+阅读 · 2021年7月8日

CBLUE: A Chinese Biomedical Language Understanding Evaluation Benchmark

CBLUE: A Chinese Biomedical Language Understanding Evaluation Benchmark

Arxiv

0+阅读 · 2021年7月6日

ReNAS:Relativistic Evaluation of Neural Architecture Search

Arxiv

11+阅读 · 2021年3月10日

Simplify the Usage of Lexicon in Chinese NER

Arxiv

5+阅读 · 2020年10月14日

Rethinking Positional Encoding in Language Pre-training

Arxiv

4+阅读 · 2020年7月9日

XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization

XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization

Arxiv

3+阅读 · 2020年3月24日

Improving Neural Question Generation using Answer Separation

Improving Neural Question Generation using Answer Separation

Arxiv

3+阅读 · 2018年9月7日

Semantic Parsing: Syntactic assurance to target sentence using LSTM Encoder CFG-Decoder

Semantic Parsing: Syntactic assurance to target sentence using LSTM Encoder CFG-Decoder

Arxiv

4+阅读 · 2018年7月18日

Fine-tuned Language Models for Text Classification

Arxiv

5+阅读 · 2018年1月18日

Multilingual Topic Models

Arxiv

3+阅读 · 2017年12月18日

微信扫码咨询专知VIP会员