PALI: 用于波斯-阿拉伯字母语言鉴别的基准 (PALI: A Language Identification Benchmark for Perso-Arabic Scripts) - 专知论文

会员服务 ·

0

分层模型 · 基准 · 低资源 · 分层 · 分类器 ·

2023 年 4 月 3 日

PALI: A Language Identification Benchmark for Perso-Arabic Scripts

翻译：PALI: 用于波斯-阿拉伯字母语言鉴别的基准

Sina Ahmadi,Milind Agarwal,Antonios Anastasopoulos

from arxiv, 13 pages - accepted at VarDial at EACL 2023

The Perso-Arabic scripts are a family of scripts that are widely adopted and used by various linguistic communities around the globe. Identifying various languages using such scripts is crucial to language technologies and challenging in low-resource setups. As such, this paper sheds light on the challenges of detecting languages using Perso-Arabic scripts, especially in bilingual communities where ``unconventional'' writing is practiced. To address this, we use a set of supervised techniques to classify sentences into their languages. Building on these, we also propose a hierarchical model that targets clusters of languages that are more often confused by the classifiers. Our experiment results indicate the effectiveness of our solutions.

翻译：波斯-阿拉伯字母是一组被世界上各种语言社区广泛采用和使用的字母。使用这种字母来识别不同语言对于语言技术在低资源环境下尤其具有挑战性。因此，本文着重探讨了采用波斯-阿拉伯字母进行语言检测面临的挑战，特别是在双语社区中进行“非传统”书写的情况。为了解决这个问题，我们使用了一组监督技术将句子分类到它们的语言中。在此基础上，我们还提出了一种针对分类器更容易混淆的语言集群的分层模型。实验结果表明，我们的解决方案是有效的。

0

相关内容

分层模型

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

专知会员服务

108+阅读 · 2020年5月1日

【领域对抗学习的低资源文本分类】Low-Resource Text Classification using Domain-Adversarial Learning

【领域对抗学习的低资源文本分类】Low-Resource Text Classification using Domain-Adversarial Learning

专知会员服务

23+阅读 · 2020年4月22日

【论文推荐】将机器语言模型扩展到人类级别的语言理解，Extending Machine Language Models toward Human-Level Language Understanding

【论文推荐】将机器语言模型扩展到人类级别的语言理解，Extending Machine Language Models toward Human-Level Language Understanding

专知会员服务

18+阅读 · 2019年12月14日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

GNN 新基准！Long Range Graph Benchmark

GNN 新基准！Long Range Graph Benchmark

图与推荐

0+阅读 · 2022年10月18日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

专知

30+阅读 · 2018年3月22日

【论文推荐】最新6篇视觉问答（VQA）相关论文—目标推理、深度循环模型、可解释性、数据可视化、Triplet学习、基准

【论文推荐】最新6篇视觉问答（VQA）相关论文—目标推理、深度循环模型、可解释性、数据可视化、Triplet学习、基准

专知

15+阅读 · 2018年2月3日

随机多尺度系统的亚稳态理论

国家自然科学基金

0+阅读 · 2015年12月31日

以基因功能研究为基础的结直肠癌易感区段10q22.3的精细定位

国家自然科学基金

0+阅读 · 2015年12月31日

遥感影像虚假地形感知现象自动消除技术研究

国家自然科学基金

0+阅读 · 2013年12月31日

无电滞后钛酸铋钠基电致伸缩陶瓷的结构调控及其机理研究

国家自然科学基金

0+阅读 · 2013年12月31日

粉尘爆炸点火源点燃机理、风险评价与预防

国家自然科学基金

0+阅读 · 2011年12月31日

Evaluating ChatGPT's Performance for Multilingual and Emoji-based Hate Speech Detection

Arxiv

0+阅读 · 2023年5月23日

SCITAB: A Challenging Benchmark for Compositional Reasoning and Claim Verification on Scientific Tables

Arxiv

0+阅读 · 2023年5月22日

DUMB: A Benchmark for Smart Evaluation of Dutch Models

Arxiv

0+阅读 · 2023年5月22日

Empower Large Language Model to Perform Better on Industrial Domain-Specific Question Answering

Arxiv

0+阅读 · 2023年5月19日

Recent Advances in Deep Learning-based Dialogue Systems

Arxiv

18+阅读 · 2021年5月10日

VIP会员

文章信息

相关主题

相关VIP内容

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

专知会员服务

108+阅读 · 2020年5月1日

【领域对抗学习的低资源文本分类】Low-Resource Text Classification using Domain-Adversarial Learning

【领域对抗学习的低资源文本分类】Low-Resource Text Classification using Domain-Adversarial Learning

专知会员服务

23+阅读 · 2020年4月22日

【论文推荐】将机器语言模型扩展到人类级别的语言理解，Extending Machine Language Models toward Human-Level Language Understanding

【论文推荐】将机器语言模型扩展到人类级别的语言理解，Extending Machine Language Models toward Human-Level Language Understanding

专知会员服务

18+阅读 · 2019年12月14日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

《北约联合仿真与集成、验证与鉴定服务标准》2025最新40页

《面向协同任务的无人地面车辆与无人机（UGV-UAV）集成研究综述》2025最新综述论文

《理解大语言模型在军事战术任务规划中的局限性》

《国防与安全会议论文集》最新80页

相关资讯

GNN 新基准！Long Range Graph Benchmark

GNN 新基准！Long Range Graph Benchmark

图与推荐

0+阅读 · 2022年10月18日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

专知

30+阅读 · 2018年3月22日

【论文推荐】最新6篇视觉问答（VQA）相关论文—目标推理、深度循环模型、可解释性、数据可视化、Triplet学习、基准

【论文推荐】最新6篇视觉问答（VQA）相关论文—目标推理、深度循环模型、可解释性、数据可视化、Triplet学习、基准

专知

15+阅读 · 2018年2月3日

相关论文

Evaluating ChatGPT's Performance for Multilingual and Emoji-based Hate Speech Detection

Arxiv

0+阅读 · 2023年5月23日

SCITAB: A Challenging Benchmark for Compositional Reasoning and Claim Verification on Scientific Tables

Arxiv

0+阅读 · 2023年5月22日

DUMB: A Benchmark for Smart Evaluation of Dutch Models

Arxiv

0+阅读 · 2023年5月22日

Empower Large Language Model to Perform Better on Industrial Domain-Specific Question Answering

Arxiv

0+阅读 · 2023年5月19日

Recent Advances in Deep Learning-based Dialogue Systems

Arxiv

18+阅读 · 2021年5月10日

相关基金

随机多尺度系统的亚稳态理论

国家自然科学基金

0+阅读 · 2015年12月31日

以基因功能研究为基础的结直肠癌易感区段10q22.3的精细定位

国家自然科学基金

0+阅读 · 2015年12月31日

遥感影像虚假地形感知现象自动消除技术研究

国家自然科学基金

0+阅读 · 2013年12月31日

无电滞后钛酸铋钠基电致伸缩陶瓷的结构调控及其机理研究

国家自然科学基金

0+阅读 · 2013年12月31日

粉尘爆炸点火源点燃机理、风险评价与预防

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员