NusaCrowd: Open Source Initiative for Indonesian NLP Resources - 专知论文

会员服务 ·

0

NLP · 可理解性 · 数据集 · 自动语音识别 · Processing（编程语言） ·

2023 年 6 月 5 日

NusaCrowd: Open Source Initiative for Indonesian NLP Resources

翻译：暂无翻译

Samuel Cahyawijaya,Holy Lovenia,Alham Fikri Aji,Genta Indra Winata,Bryan Wilie,Rahmad Mahendra,Christian Wibisono,Ade Romadhony,Karissa Vincentio,Fajri Koto,Jennifer Santoso,David Moeljadi,Cahya Wirawan,Frederikus Hudi,Ivan Halim Parmonangan,Ika Alfina,Muhammad Satrio Wicaksono,Ilham Firdausi Putra,Samsul Rahmadani,Yulianti Oenang,Ali Akbar Septiandri,James Jaya,Kaustubh D. Dhole,Arie Ardiyanti Suryani,Rifki Afina Putri,Dan Su,Keith Stevens,Made Nindyatama Nityasya,Muhammad Farid Adilazuarda,Ryan Ignatius,Ryandito Diandaru,Tiezheng Yu,Vito Ghifari,Wenliang Dai,Yan Xu,Dyah Damapuspita,Cuk Tho,Ichwanul Muslim Karo Karo,Tirana Noor Fatyanosa,Ziwei Ji,Pascale Fung,Graham Neubig,Timothy Baldwin,Sebastian Ruder,Herry Sujaini,Sakriani Sakti,Ayu Purwarianti

We present NusaCrowd, a collaborative initiative to collect and unify existing resources for Indonesian languages, including opening access to previously non-public resources. Through this initiative, we have brought together 137 datasets and 118 standardized data loaders. The quality of the datasets has been assessed manually and automatically, and their value is demonstrated through multiple experiments. NusaCrowd's data collection enables the creation of the first zero-shot benchmarks for natural language understanding and generation in Indonesian and the local languages of Indonesia. Furthermore, NusaCrowd brings the creation of the first multilingual automatic speech recognition benchmark in Indonesian and the local languages of Indonesia. Our work strives to advance natural language processing (NLP) research for languages that are under-represented despite being widely spoken.

翻译：暂无翻译

0

相关内容

NLP

NLP:自然语言处理

百篇论文纵览大型语言模型最新研究进展

百篇论文纵览大型语言模型最新研究进展

专知会员服务

70+阅读 · 2023年3月31日

自然语言处理顶会NAACL2022最佳论文出炉！

自然语言处理顶会NAACL2022最佳论文出炉！

专知会员服务

43+阅读 · 2022年6月30日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

【论文翻译】NLP注意力机制综述论文翻译，Attention, please! A Critical Review of Neural Attention Models in Natural Language Processing

【论文翻译】NLP注意力机制综述论文翻译，Attention, please! A Critical Review of Neural Attention Models in Natural Language Processing

专知会员服务

96+阅读 · 2020年4月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【Github】All4NLP：自然语言处理相关资源整理

【Github】All4NLP：自然语言处理相关资源整理

AINLP

23+阅读 · 2019年8月9日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

40+阅读 · 2019年6月9日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

自然语言处理 (NLP)资源大全

自然语言处理 (NLP)资源大全

机械鸡

35+阅读 · 2017年9月17日

【推荐】图像分类必读开创性论文汇总

【推荐】图像分类必读开创性论文汇总

机器学习研究会

14+阅读 · 2017年8月15日

玉米ZmSNAC1基因调控植物耐旱的机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

切伦科夫辐射稀土材料发光用于乳腺癌淋巴成像

国家自然科学基金

0+阅读 · 2013年12月31日

单端反射镀膜长周期光栅的钢筋锈蚀传感机理及方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

电化学法制备金属纳米粒子/金属有机骨架复合膜及其电催化性质研究

国家自然科学基金

0+阅读 · 2013年12月31日

森林资源遥感监测波段窗口研究

国家自然科学基金

0+阅读 · 2012年12月31日

(规范)超引力黑洞与黑环

国家自然科学基金

0+阅读 · 2012年12月31日

陶瓷/金属杂化超常材料电磁感应透明效应及调谐机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

松散砂土扩散性失稳破坏机理与大变形流滑灾变SPH数值模拟

国家自然科学基金

0+阅读 · 2012年12月31日

雌马酚对映异构体对大肠癌发生及大肠癌细胞增生、侵袭、和凋亡的影响及其作用机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

改进Max-SAT算法的关键技术研究

国家自然科学基金

0+阅读 · 2009年12月31日

Sources of Opacity in Computer Systems: Towards a Comprehensive Taxonomy

Arxiv

0+阅读 · 2023年7月26日

ARB: Advanced Reasoning Benchmark for Large Language Models

ARB: Advanced Reasoning Benchmark for Large Language Models

Arxiv

0+阅读 · 2023年7月25日

PubMed and Beyond: Recent Advances and Best Practices in Biomedical Literature Search

PubMed and Beyond: Recent Advances and Best Practices in Biomedical Literature Search

Arxiv

0+阅读 · 2023年7月24日

Leveraging Large Language Models (LLMs) for Process Mining (Technical Report)

Arxiv

0+阅读 · 2023年7月24日

Remote Bio-Sensing: Open Source Benchmark Framework for Fair Evaluation of rPPG

Arxiv

0+阅读 · 2023年7月24日

Multimodal Document Analytics for Banking Process Automation

Arxiv

0+阅读 · 2023年7月21日

Communicative Message Passing for Inductive Relation Reasoning

Communicative Message Passing for Inductive Relation Reasoning

Arxiv

11+阅读 · 2020年12月16日

Reasoning in Dialog: Improving Response Generation by Context Reading Comprehension

Arxiv

12+阅读 · 2020年12月14日

Few-shot Natural Language Generation for Task-Oriented Dialog

Few-shot Natural Language Generation for Task-Oriented Dialog

Arxiv

30+阅读 · 2020年2月27日

Commonsense Reasoning for Natural Language Understanding: A Survey of Benchmarks, Resources, and Approaches

Arxiv

16+阅读 · 2019年4月2日

VIP会员

文章信息

相关主题

自动语音识别

Processing（编程语言）

相关VIP内容

百篇论文纵览大型语言模型最新研究进展

百篇论文纵览大型语言模型最新研究进展

专知会员服务

70+阅读 · 2023年3月31日

自然语言处理顶会NAACL2022最佳论文出炉！

自然语言处理顶会NAACL2022最佳论文出炉！

专知会员服务

43+阅读 · 2022年6月30日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

【论文翻译】NLP注意力机制综述论文翻译，Attention, please! A Critical Review of Neural Attention Models in Natural Language Processing

【论文翻译】NLP注意力机制综述论文翻译，Attention, please! A Critical Review of Neural Attention Models in Natural Language Processing

专知会员服务

96+阅读 · 2020年4月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

【牛津大学博士论文】将序列结构与几何结构融入深度神经网络

工程视角：影响战争进程的小型无人机

企业级AI应用开发：从技术选型到生产落地

AI生成代码缺陷综述

相关资讯

【Github】All4NLP：自然语言处理相关资源整理

【Github】All4NLP：自然语言处理相关资源整理

AINLP

23+阅读 · 2019年8月9日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

40+阅读 · 2019年6月9日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

自然语言处理 (NLP)资源大全

自然语言处理 (NLP)资源大全

机械鸡

35+阅读 · 2017年9月17日

【推荐】图像分类必读开创性论文汇总

【推荐】图像分类必读开创性论文汇总

机器学习研究会

14+阅读 · 2017年8月15日

相关论文

Sources of Opacity in Computer Systems: Towards a Comprehensive Taxonomy

Arxiv

0+阅读 · 2023年7月26日

ARB: Advanced Reasoning Benchmark for Large Language Models

ARB: Advanced Reasoning Benchmark for Large Language Models

Arxiv

0+阅读 · 2023年7月25日

PubMed and Beyond: Recent Advances and Best Practices in Biomedical Literature Search

PubMed and Beyond: Recent Advances and Best Practices in Biomedical Literature Search

Arxiv

0+阅读 · 2023年7月24日

Leveraging Large Language Models (LLMs) for Process Mining (Technical Report)

Arxiv

0+阅读 · 2023年7月24日

Remote Bio-Sensing: Open Source Benchmark Framework for Fair Evaluation of rPPG

Arxiv

0+阅读 · 2023年7月24日

Multimodal Document Analytics for Banking Process Automation

Arxiv

0+阅读 · 2023年7月21日

Communicative Message Passing for Inductive Relation Reasoning

Communicative Message Passing for Inductive Relation Reasoning

Arxiv

11+阅读 · 2020年12月16日

Reasoning in Dialog: Improving Response Generation by Context Reading Comprehension

Arxiv

12+阅读 · 2020年12月14日

Few-shot Natural Language Generation for Task-Oriented Dialog

Few-shot Natural Language Generation for Task-Oriented Dialog

Arxiv

30+阅读 · 2020年2月27日

Commonsense Reasoning for Natural Language Understanding: A Survey of Benchmarks, Resources, and Approaches

Arxiv

16+阅读 · 2019年4月2日

相关基金

玉米ZmSNAC1基因调控植物耐旱的机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

切伦科夫辐射稀土材料发光用于乳腺癌淋巴成像

国家自然科学基金

0+阅读 · 2013年12月31日

单端反射镀膜长周期光栅的钢筋锈蚀传感机理及方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

电化学法制备金属纳米粒子/金属有机骨架复合膜及其电催化性质研究

国家自然科学基金

0+阅读 · 2013年12月31日

森林资源遥感监测波段窗口研究

国家自然科学基金

0+阅读 · 2012年12月31日

(规范)超引力黑洞与黑环

国家自然科学基金

0+阅读 · 2012年12月31日

陶瓷/金属杂化超常材料电磁感应透明效应及调谐机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

松散砂土扩散性失稳破坏机理与大变形流滑灾变SPH数值模拟

国家自然科学基金

0+阅读 · 2012年12月31日

雌马酚对映异构体对大肠癌发生及大肠癌细胞增生、侵袭、和凋亡的影响及其作用机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

改进Max-SAT算法的关键技术研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员