Github项目推荐 | 用于自然语言处理的开源 Python 库 —— PyTorch-NLP

2018 年 3 月 20 日 AI研习社 孔令双

PyTorch-NLP 是用于自然语言处理的开源 Python 库，它构建于最新的研究之上，可以帮助开发者快速开发原型。PyTorch 带有预训练嵌入（pre-trained embeddings）、采样器、数据集加载器、神经网络模型和文本编码器。

详细信息可访问 PyTorch-NLP 官方网站：

https://pytorchnlp.readthedocs.io/en/latest/

Github 链接：

https://github.com/PetrochukM/PyTorch-NLP

安装

请先安装 Python 3.5+ 和 PyTorch 0.2.0 及以上版本，然后用 pip 安装 PyTorch-NLP：

pip install pytorch-nlp

可选安装

如果您想使用SpaCy <http://spacy.io/> 中的英文标记器，则需要安装 SpaCy 并下载其英文模型：

pip install spacy
python -m spacy download en_core_web_sm

或者，您可能需要使用 NLTK <http://nltk.org/>的 Moses tokenizer。您必须安装NLTK 并下载所需的数据：

pip install nltk
python -m nltk.downloader perluniprops nonbreaking_prefixes

用法

PyTorch-NLP 的设计思想直观并且简单易用：

加载 FastText，Facebook 的快速文本分类器

from torchnlp.embeddings import FastText
vectors = FastText()
vectors['hello']  # [torch.FloatTensor of size 100]

加载数据集，比如 IMBD

from torchnlp.datasets import imdb_dataset
train = imdb_dataset(train=True)
train[0]  # {'text': 'For a movie that gets..', 'sentiment': 'pos'}

用 torchnlp.metrics 计算 BLEU 分数：

from torchnlp.metrics import get_moses_multi_bleu
hypotheses = ["The brown fox jumps over the dog 笑"]
references = ["The quick brown fox jumps over the lazy dog 笑"]
get_moses_multi_bleu(hypotheses, references, lowercase=True)  # 47.9

【限时拼团】

NLP 工程师入门实践班

三大模块，五大应用，知识点全覆盖；

海外博士讲师，丰富项目分享经验；

理论 + 实践，带你实战典型行业应用；

专业答疑社群，结交志同道合伙伴。

▼▼▼

新人福利

关注 AI 研习社（okweiwu），回复 1 领取

【超过 1000G 神经网络 / AI / 大数据，教程，论文】

如何在NLP领域干成第一件事？

▼▼▼

登录查看更多

相关内容

Python 库

关注 13

【干货书】Python机器学习导论，340页pdf数据科学家指南

专知会员服务

175+阅读 · 2020年6月4日

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

专知会员服务

108+阅读 · 2020年5月1日

【2020新书】数据科学:十大Python项目，247页pdf

专知会员服务

216+阅读 · 2020年2月21日

Transformer文本分类代码

专知会员服务

118+阅读 · 2020年2月3日

【实战电子书+代码】自然语言处理的实战，545页pdf，使用Python理解、分析和生成文本

专知会员服务

265+阅读 · 2019年12月28日

【书籍】深度学习框架：PyTorch入门与实践（附代码）

专知会员服务

167+阅读 · 2019年10月28日

【下载】Python自然语言处理实战书籍和代码《Natural Language Processing in Action》

专知会员服务

80+阅读 · 2019年10月27日

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

计算机视觉最佳实践、代码示例和相关文档

专知会员服务

20+阅读 · 2019年10月9日

Github 项目推荐 | PyTorch 实现的 GAN 文本生成框架

AI研习社

35+阅读 · 2019年6月10日

PyTorch自然语言处理实战（附详细代码下载）

专知

67+阅读 · 2019年2月12日

2018-Github最热门机器学习开源项目Top10分享

深度学习与NLP

8+阅读 · 2019年1月22日

机器学习开源项目Top10

AI100

4+阅读 · 2019年1月20日

Github 项目推荐 | 用于训练和测试文本游戏强化学习 Agent 的工具

AI研习社

5+阅读 · 2018年7月16日

Github 项目推荐 | YOLOv3 的最小化 PyTorch 实现

AI研习社

25+阅读 · 2018年5月31日

Github 项目推荐 | 可提取结构化信息的自然语言理解 Python 库 Snips NLU

AI研习社

3+阅读 · 2018年3月13日

Github 项目推荐 | 用 Pytorch 实现的 Capsule Network

AI研习社

22+阅读 · 2018年3月7日

精选Top30！最实用的python开源项目都在这里

乌镇智库

4+阅读 · 2018年1月26日

推荐｜Python库中Top10 的AI项目（星级3k+)，赶紧收藏！

全球人工智能

10+阅读 · 2018年1月16日

Pre-training Text Representations as Meta Learning

Arxiv

13+阅读 · 2020年4月12日

Improving Candidate Generation for Low-resource Cross-lingual Entity Linking

Arxiv

8+阅读 · 2020年3月3日

Hierarchical Meta Learning

Arxiv

9+阅读 · 2019年4月19日

Unsupervised Multilingual Word Embeddings

Arxiv

4+阅读 · 2018年9月6日

Notes on Deep Learning for NLP

Arxiv

22+阅读 · 2018年8月30日

End-to-end Speech Recognition with Word-based RNN Language Models

Arxiv

3+阅读 · 2018年8月8日

Learned in Translation: Contextualized Word Vectors

Arxiv

6+阅读 · 2018年6月20日

A Tidy Data Model for Natural Language Processing using cleanNLP

Arxiv

4+阅读 · 2018年5月3日

Lessons from the Bible on Modern Topics: Low-Resource Multilingual Topic Model Evaluation

Arxiv

4+阅读 · 2018年4月26日

Deep contextualized word representations

Arxiv

10+阅读 · 2018年3月22日

VIP会员