AfroLID:非洲语言神经语言识别工具 (AfroLID: A Neural Language Identification Tool for African Languages) - 专知论文

会员服务 ·

0

WEB · MINE · Performer · Analysis · TOOLS ·

2022 年 12 月 7 日

AfroLID: A Neural Language Identification Tool for African Languages

翻译：AfroLID:非洲语言神经语言识别工具

Ife Adebara,AbdelRahim Elmadany,Muhammad Abdul-Mageed,Alcides Alcoba Inciarte

from arxiv, To appear at EMNLP 2022 Main conference

Language identification (LID) is a crucial precursor for NLP, especially for mining web data. Problematically, most of the world's 7000+ languages today are not covered by LID technologies. We address this pressing issue for Africa by introducing AfroLID, a neural LID toolkit for $517$ African languages and varieties. AfroLID exploits a multi-domain web dataset manually curated from across 14 language families utilizing five orthographic systems. When evaluated on our blind Test set, AfroLID achieves 95.89 F_1-score. We also compare AfroLID to five existing LID tools that each cover a small number of African languages, finding it to outperform them on most languages. We further show the utility of AfroLID in the wild by testing it on the acutely under-served Twitter domain. Finally, we offer a number of controlled case studies and perform a linguistically-motivated error analysis that allow us to both showcase AfroLID's powerful capabilities and limitations.

翻译：语言识别(LID)是NLP的关键前体,特别是对于采矿网络数据而言。问题在于,当今世界上大多数7000+语言的7000+语言没有被LID技术所覆盖。我们通过引入用于517美元非洲语言和品种的神经LID工具包AFLID来解决非洲面临的这一紧迫问题。 AfroLID利用五种方位系统从14种语言家庭手工整理的多域网络数据集。在对我们的盲人测试集进行评估时,AfroLID达到了95.89 F_1-score。我们还将AfroLID与现有的5种现有LID工具进行了比较,其中每种工具都涵盖少量非洲语言,发现它能够以大多数语言表现这些语言。我们进一步展示了AfroLID在野外的效用,在服务严重不足的Twitter域测试了它。最后,我们提供了一些受控制的案例研究,并进行了语言驱动的错误分析,使我们能够同时展示AfroLID的强大能力和局限性。

0

相关内容

WEB

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

2020数据工程师成长路线图

专知会员服务

19+阅读 · 2020年9月6日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

专知会员服务

77+阅读 · 2020年2月8日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium7

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium7

中国图象图形学学会CSIG

0+阅读 · 2021年11月15日

【ICIG2021】Latest News & Announcements of the Industry Talk2

【ICIG2021】Latest News & Announcements of the Industry Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年7月29日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

基于局部不变性特征和几何结构相似性的异源遥感影像自动配准

国家自然科学基金

1+阅读 · 2013年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

数据分析中的大规模矩阵优化模型求解算法研究

国家自然科学基金

2+阅读 · 2013年12月31日

HMGB1对Treg介导宫颈癌免疫逃逸的影响及机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

lincRNA-ETS1-2上调癌基因ETS-1表达促进雄激素非依赖性前列腺癌演进的机制

国家自然科学基金

0+阅读 · 2012年12月31日

ASPP2调节肝癌细胞上皮间质转化的研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于list-mode数据的快速SART真3D PET断层重建算法的研究

国家自然科学基金

0+阅读 · 2011年12月31日

编码密码学中若干组合对象研究

国家自然科学基金

0+阅读 · 2009年12月31日

透平机械刷式密封泄漏流动与迟滞特性和流固耦合机理的研究

国家自然科学基金

0+阅读 · 2009年12月31日

基于双路光相位调制光学倍频法的毫米波Radio Over Fiber系统研究

国家自然科学基金

0+阅读 · 2008年12月31日

A Large-Scale Multilingual Study of Visual Constraints on Linguistic Selection of Descriptions

Arxiv

0+阅读 · 2023年2月9日

Subset verification and search algorithms for causal DAGs

Arxiv

0+阅读 · 2023年2月8日

On the Applicability of Language Models to Block-Based Programs

Arxiv

0+阅读 · 2023年2月8日

Learning Translation Quality Evaluation on Low Resource Languages from Large Language Models

Arxiv

0+阅读 · 2023年2月7日

CHiLS: Zero-Shot Image Classification with Hierarchical Label Sets

Arxiv

0+阅读 · 2023年2月7日

Physics-Informed Machine Learning: A Survey on Problems, Methods and Applications

Arxiv

72+阅读 · 2022年11月15日

Hyperbolic Graph Neural Networks: A Review of Methods and Applications

Hyperbolic Graph Neural Networks: A Review of Methods and Applications

Arxiv

28+阅读 · 2022年2月28日

An Overview on Machine Translation Evaluation

An Overview on Machine Translation Evaluation

Arxiv

14+阅读 · 2022年2月22日

Text Classification Algorithms: A Survey

Arxiv

15+阅读 · 2019年6月25日

Zero-Shot Transfer Learning for Event Extraction

Arxiv

10+阅读 · 2017年7月4日

VIP会员

文章信息

相关主题

相关VIP内容

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

2020数据工程师成长路线图

专知会员服务

19+阅读 · 2020年9月6日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

专知会员服务

77+阅读 · 2020年2月8日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

隐身自主无人水下航行器技术如何变革水下作战并重塑海军竞争

《俄乌战争中的无人系统：新的战争方式与新兴趋势——来自前线的印象》报告

《海上自主水面船舶远程操作中心：安全可持续运行的多维度分析》

相关资讯

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium7

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium7

中国图象图形学学会CSIG

0+阅读 · 2021年11月15日

【ICIG2021】Latest News & Announcements of the Industry Talk2

【ICIG2021】Latest News & Announcements of the Industry Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年7月29日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

A Large-Scale Multilingual Study of Visual Constraints on Linguistic Selection of Descriptions

Arxiv

0+阅读 · 2023年2月9日

Subset verification and search algorithms for causal DAGs

Arxiv

0+阅读 · 2023年2月8日

On the Applicability of Language Models to Block-Based Programs

Arxiv

0+阅读 · 2023年2月8日

Learning Translation Quality Evaluation on Low Resource Languages from Large Language Models

Arxiv

0+阅读 · 2023年2月7日

CHiLS: Zero-Shot Image Classification with Hierarchical Label Sets

Arxiv

0+阅读 · 2023年2月7日

Physics-Informed Machine Learning: A Survey on Problems, Methods and Applications

Arxiv

72+阅读 · 2022年11月15日

Hyperbolic Graph Neural Networks: A Review of Methods and Applications

Hyperbolic Graph Neural Networks: A Review of Methods and Applications

Arxiv

28+阅读 · 2022年2月28日

An Overview on Machine Translation Evaluation

An Overview on Machine Translation Evaluation

Arxiv

14+阅读 · 2022年2月22日

Text Classification Algorithms: A Survey

Arxiv

15+阅读 · 2019年6月25日

Zero-Shot Transfer Learning for Event Extraction

Arxiv

10+阅读 · 2017年7月4日

相关基金

基于局部不变性特征和几何结构相似性的异源遥感影像自动配准

国家自然科学基金

1+阅读 · 2013年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

数据分析中的大规模矩阵优化模型求解算法研究

国家自然科学基金

2+阅读 · 2013年12月31日

HMGB1对Treg介导宫颈癌免疫逃逸的影响及机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

lincRNA-ETS1-2上调癌基因ETS-1表达促进雄激素非依赖性前列腺癌演进的机制

国家自然科学基金

0+阅读 · 2012年12月31日

ASPP2调节肝癌细胞上皮间质转化的研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于list-mode数据的快速SART真3D PET断层重建算法的研究

国家自然科学基金

0+阅读 · 2011年12月31日

编码密码学中若干组合对象研究

国家自然科学基金

0+阅读 · 2009年12月31日

透平机械刷式密封泄漏流动与迟滞特性和流固耦合机理的研究

国家自然科学基金

0+阅读 · 2009年12月31日

基于双路光相位调制光学倍频法的毫米波Radio Over Fiber系统研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员