在乌尔都州,使用快速文本嵌入字词相似性任务来使用快速文本嵌入的词相似性任务共同发生事件 (Co-occurrences using Fasttext embeddings for word similarity tasks in Urdu) - 专知论文

会员服务 ·

0

语言模型化 · MoDELS · 相似度 · N元模型 · 跳元模型 ·

2021 年 2 月 22 日

Co-occurrences using Fasttext embeddings for word similarity tasks in Urdu

翻译：在乌尔都州,使用快速文本嵌入字词相似性任务来使用快速文本嵌入的词相似性任务共同发生事件

Usama Khalid,Aizaz Hussain,Muhammad Umair Arshad,Waseem Shahzad,Mirza Omer Beg

Urdu is a widely spoken language in South Asia. Though immoderate literature exists for the Urdu language still the data isn't enough to naturally process the language by NLP techniques. Very efficient language models exist for the English language, a high resource language, but Urdu and other under-resourced languages have been neglected for a long time. To create efficient language models for these languages we must have good word embedding models. For Urdu, we can only find word embeddings trained and developed using the skip-gram model. In this paper, we have built a corpus for Urdu by scraping and integrating data from various sources and compiled a vocabulary for the Urdu language. We also modify fasttext embeddings and N-Grams models to enable training them on our built corpus. We have used these trained embeddings for a word similarity task and compared the results with existing techniques.

翻译：乌尔都语是南亚广泛使用的语言。虽然乌尔都语有中等文学, 但数据仍不足以自然地用NLP技术处理语言。英语、高资源语言存在非常高效的语言模式, 但乌尔都语和其他资源不足的语言长期以来一直被忽略。要为这些语言创建高效的语言模式, 我们必须有好的字嵌入模式。对于乌尔都语, 我们只能用跳格模式来找到经过培训和开发的词嵌入模式。在本文中, 我们通过从各种来源中提取和整合数据, 并编集乌尔都语词汇, 建立了乌尔都语的集合。我们还修改了快速文本嵌入模式和N- grams 模式, 以便能够在我们构建的文体上对其进行培训。我们用这些经过训练的嵌入模式来完成一个词相似的任务, 并将结果与现有技术进行比较。

0

相关内容

语言模型化

语言模型化

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Transformer文本分类代码

Transformer文本分类代码

专知会员服务

118+阅读 · 2020年2月3日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【AAAI2020-清华大学】张量图卷积网络文本分类，Tensor Graph Convolutional Networks for Text Classification

【AAAI2020-清华大学】张量图卷积网络文本分类，Tensor Graph Convolutional Networks for Text Classification

专知会员服务

76+阅读 · 2020年1月16日

近期必读的10篇ACL 2019【图神经网络（GNN）+NLP】相关论文和代码

专知会员服务

71+阅读 · 2020年1月10日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

【论文笔记】通俗理解少样本文本分类 (Few-Shot Text Classification) (1)

【论文笔记】通俗理解少样本文本分类 (Few-Shot Text Classification) (1)

深度学习自然语言处理

7+阅读 · 2020年4月8日

【Github】nlp-tutorial：TensorFlow 和 PyTorch 实现各种NLP模型

【Github】nlp-tutorial：TensorFlow 和 PyTorch 实现各种NLP模型

AINLP

14+阅读 · 2019年9月4日

【Github】All4NLP：自然语言处理相关资源整理

【Github】All4NLP：自然语言处理相关资源整理

AINLP

23+阅读 · 2019年8月9日

CCF推荐 | 国际会议信息10条

CCF推荐 | 国际会议信息10条

Call4Papers

8+阅读 · 2019年5月27日

ICLR2019最佳论文出炉

ICLR2019最佳论文出炉

专知

12+阅读 · 2019年5月6日

计算机 | EMNLP 2019等国际会议信息6条

计算机 | EMNLP 2019等国际会议信息6条

Call4Papers

18+阅读 · 2019年4月26日

基于PyTorch/TorchText的自然语言处理库

基于PyTorch/TorchText的自然语言处理库

专知

28+阅读 · 2019年4月22日

计算机 | ISMAR 2019等国际会议信息8条

计算机 | ISMAR 2019等国际会议信息8条

Call4Papers

3+阅读 · 2019年3月5日

BERT相关论文、文章和代码资源汇总

BERT相关论文、文章和代码资源汇总

AINLP

19+阅读 · 2018年11月17日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

The Surprising Performance of Simple Baselines for Misinformation Detection

The Surprising Performance of Simple Baselines for Misinformation Detection

Arxiv

0+阅读 · 2021年4月14日

SuperSim: a test set for word similarity and relatedness in Swedish

Arxiv

0+阅读 · 2021年4月12日

All Word Embeddings from One Embedding

Arxiv

4+阅读 · 2020年5月25日

Neural Image Captioning

Neural Image Captioning

Arxiv

5+阅读 · 2019年7月2日

From direct tagging to Tagging with sentences compression

From direct tagging to Tagging with sentences compression

Arxiv

6+阅读 · 2018年10月5日

How Do Source-side Monolingual Word Embeddings Impact Neural Machine Translation?

Arxiv

5+阅读 · 2018年6月5日

A Comparison of Word Embeddings for the Biomedical Natural Language Processing

Arxiv

3+阅读 · 2018年2月1日

Word Translation Without Parallel Data

Arxiv

7+阅读 · 2018年1月30日

Fine-tuned Language Models for Text Classification

Arxiv

5+阅读 · 2018年1月18日

One-shot and few-shot learning of word embeddings

Arxiv

5+阅读 · 2017年10月27日

VIP会员

文章信息

相关主题

语言模型化

相关VIP内容

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Transformer文本分类代码

Transformer文本分类代码

专知会员服务

118+阅读 · 2020年2月3日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【AAAI2020-清华大学】张量图卷积网络文本分类，Tensor Graph Convolutional Networks for Text Classification

【AAAI2020-清华大学】张量图卷积网络文本分类，Tensor Graph Convolutional Networks for Text Classification

专知会员服务

76+阅读 · 2020年1月16日

近期必读的10篇ACL 2019【图神经网络（GNN）+NLP】相关论文和代码

专知会员服务

71+阅读 · 2020年1月10日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

《生成式人工智能与大/小语言模型在供应链管理决策优化与可持续性提升中的作用评估》最新51页

白宫发布《赢得AI竞赛：美国人工智能行动计划》最新28页

地下战：地下空间的战略博弈

《美地下作战条令手册》228页

相关资讯

【论文笔记】通俗理解少样本文本分类 (Few-Shot Text Classification) (1)

【论文笔记】通俗理解少样本文本分类 (Few-Shot Text Classification) (1)

深度学习自然语言处理

7+阅读 · 2020年4月8日

【Github】nlp-tutorial：TensorFlow 和 PyTorch 实现各种NLP模型

【Github】nlp-tutorial：TensorFlow 和 PyTorch 实现各种NLP模型

AINLP

14+阅读 · 2019年9月4日

【Github】All4NLP：自然语言处理相关资源整理

【Github】All4NLP：自然语言处理相关资源整理

AINLP

23+阅读 · 2019年8月9日

CCF推荐 | 国际会议信息10条

CCF推荐 | 国际会议信息10条

Call4Papers

8+阅读 · 2019年5月27日

ICLR2019最佳论文出炉

ICLR2019最佳论文出炉

专知

12+阅读 · 2019年5月6日

计算机 | EMNLP 2019等国际会议信息6条

计算机 | EMNLP 2019等国际会议信息6条

Call4Papers

18+阅读 · 2019年4月26日

基于PyTorch/TorchText的自然语言处理库

基于PyTorch/TorchText的自然语言处理库

专知

28+阅读 · 2019年4月22日

计算机 | ISMAR 2019等国际会议信息8条

计算机 | ISMAR 2019等国际会议信息8条

Call4Papers

3+阅读 · 2019年3月5日

BERT相关论文、文章和代码资源汇总

BERT相关论文、文章和代码资源汇总

AINLP

19+阅读 · 2018年11月17日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

相关论文

The Surprising Performance of Simple Baselines for Misinformation Detection

The Surprising Performance of Simple Baselines for Misinformation Detection

Arxiv

0+阅读 · 2021年4月14日

SuperSim: a test set for word similarity and relatedness in Swedish

Arxiv

0+阅读 · 2021年4月12日

All Word Embeddings from One Embedding

Arxiv

4+阅读 · 2020年5月25日

Neural Image Captioning

Neural Image Captioning

Arxiv

5+阅读 · 2019年7月2日

From direct tagging to Tagging with sentences compression

From direct tagging to Tagging with sentences compression

Arxiv

6+阅读 · 2018年10月5日

How Do Source-side Monolingual Word Embeddings Impact Neural Machine Translation?

Arxiv

5+阅读 · 2018年6月5日

A Comparison of Word Embeddings for the Biomedical Natural Language Processing

Arxiv

3+阅读 · 2018年2月1日

Word Translation Without Parallel Data

Arxiv

7+阅读 · 2018年1月30日

Fine-tuned Language Models for Text Classification

Arxiv

5+阅读 · 2018年1月18日

One-shot and few-shot learning of word embeddings

Arxiv

5+阅读 · 2017年10月27日

微信扫码咨询专知VIP会员