借款或编码转换? 语言混合中精细区别说明 (Borrowing or Codeswitching? Annotating for Finer-Grained Distinctions in Language Mixing) - 专知论文

会员服务 ·

0

讲稿 · Twitter · MoDELS · 混合 · 词元分析器 ·

2022 年 6 月 10 日

Borrowing or Codeswitching? Annotating for Finer-Grained Distinctions in Language Mixing

翻译：借款或编码转换? 语言混合中精细区别说明

Elena Alvarez Mellado,Constantine Lignos

from arxiv, LREC 2022

We present a new corpus of Twitter data annotated for codeswitching and borrowing between Spanish and English. The corpus contains 9,500 tweets annotated at the token level with codeswitches, borrowings, and named entities. This corpus differs from prior corpora of codeswitching in that we attempt to clearly define and annotate the boundary between codeswitching and borrowing and do not treat common "internet-speak" ('lol', etc.) as codeswitching when used in an otherwise monolingual context. The result is a corpus that enables the study and modeling of Spanish-English borrowing and codeswitching on Twitter in one dataset. We present baseline scores for modeling the labels of this corpus using Transformer-based language models. The annotation itself is released with a CC BY 4.0 license, while the text it applies to is distributed in compliance with the Twitter terms of service.

翻译：我们为西班牙文和英文之间编码和借款提供了一套新的Twitter数据,说明西班牙文和英文之间的编码和借款情况;该文载有9 500份推特,说明有编码开关、借款和名称实体在象征性层面的推文;该文与先前的编码开关公司不同,因为我们试图明确界定和说明编码开关和借款之间的界限,不把通用的“网际话”(“lol”等)当作编码开关,在使用其他单一语言的情况下使用该词。其结果是,该文可以研究和制作一个数据集,在推特上进行西班牙文-英文借款和编码开关的模型。我们用变换语言模式为该文的标签建模提供了基线评分。该注本身以CC by 4.0的许可证发布,而其适用的文本按照Twitter服务条款分发。

0

相关内容

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

征稿 | International Joint Conference on Knowledge Graphs (IJCKG)

征稿 | International Joint Conference on Knowledge Graphs (IJCKG)

开放知识图谱

2+阅读 · 2022年5月20日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

非凸Hamilton系统的Aubry-Mather理论

国家自然科学基金

0+阅读 · 2012年12月31日

基于超亲-超疏水特性模板的具有仿生纳微结构的图案化细胞生长基底

国家自然科学基金

0+阅读 · 2012年12月31日

溶剂微胶囊固相萃取在水产品手性药物选择性代谢中的应用基础研究

国家自然科学基金

0+阅读 · 2012年12月31日

人脐带间充质干细胞诱导分化、神经生长因子基因转染及治疗糖尿病神经源性膀胱的机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

Unsupervised Domain Adaptation for Video Transformers in Action Recognition

Arxiv

0+阅读 · 2022年7月26日

Adaptive data collection for intra-individual studies affected by adherence

Adaptive data collection for intra-individual studies affected by adherence

Arxiv

0+阅读 · 2022年7月25日

A Reference Data Model for Process-Related User Interaction Logs

Arxiv

0+阅读 · 2022年7月25日

K-AID: Enhancing Pre-trained Language Models with Domain Knowledge for Question Answering

Arxiv

15+阅读 · 2021年9月22日

Incorporating Dictionaries into Deep Neural Networks for the Chinese Clinical Named Entity Recognition

Arxiv

12+阅读 · 2018年4月13日

VIP会员

文章信息

相关主题

词元分析器

相关VIP内容

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《利用人工智能对军事行动进行建模》

《利用人工智能学习、优化与推演美国海军作战部队的战略布局与分散（续文）》

机器人、无人机与实时影像：应对城市爆炸威胁的三大技术方案

《指挥官意图消息中关键概念自动提取》最新47页

相关资讯

征稿 | International Joint Conference on Knowledge Graphs (IJCKG)

征稿 | International Joint Conference on Knowledge Graphs (IJCKG)

开放知识图谱

2+阅读 · 2022年5月20日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

Unsupervised Domain Adaptation for Video Transformers in Action Recognition

Arxiv

0+阅读 · 2022年7月26日

Adaptive data collection for intra-individual studies affected by adherence

Adaptive data collection for intra-individual studies affected by adherence

Arxiv

0+阅读 · 2022年7月25日

A Reference Data Model for Process-Related User Interaction Logs

Arxiv

0+阅读 · 2022年7月25日

K-AID: Enhancing Pre-trained Language Models with Domain Knowledge for Question Answering

Arxiv

15+阅读 · 2021年9月22日

Incorporating Dictionaries into Deep Neural Networks for the Chinese Clinical Named Entity Recognition

Arxiv

12+阅读 · 2018年4月13日

相关基金

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

非凸Hamilton系统的Aubry-Mather理论

国家自然科学基金

0+阅读 · 2012年12月31日

基于超亲-超疏水特性模板的具有仿生纳微结构的图案化细胞生长基底

国家自然科学基金

0+阅读 · 2012年12月31日

溶剂微胶囊固相萃取在水产品手性药物选择性代谢中的应用基础研究

国家自然科学基金

0+阅读 · 2012年12月31日

人脐带间充质干细胞诱导分化、神经生长因子基因转染及治疗糖尿病神经源性膀胱的机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员