PyTorch-NLP,简称 torchnlp,是一个神经网络层、文本处理模块和数据集库,旨在加速自然语言处理的研究。
有兴趣加入该社区的开发者可以在 Gitter(https://gitter.im/PyTorch-NLP/Lobby) 和 Google Group(https://groups.google.com/forum/#!forum/pytorch-nlp) 上跟作者交流。
Github 地址:
https://github.com/PetrochukM/PyTorch-NLP
先确保你已经安装好了 Python 3.5+ 、PyTorch 0.2.0 或者更新的版本,你可以通过 pip 来安装 pytorch-nlp:
pip install pytorch-nlp
完整文档地址如下:
https://pytorchnlp.readthedocs.io/
举例,加载 IMDB 数据集:
from torchnlp.datasets import imdb_dataset
# Load the imdb training dataset
train = imdb_dataset(train=True)
train[0] # RETURNS: {'text': 'For a movie that gets..', 'sentiment': 'pos'}
(http://pytorchnlp.readthedocs.io/en/latest/source/torchnlp.nn.html)
例如,从神经网络包中,应用一个简单循环单元(SRU):
from torchnlp.nn import SRU
import torch
input_ = torch.autograd.Variable(torch.randn(6, 3, 10))
sru = SRU(10, 20)
# Apply a Simple Recurrent Unit to `input_`
sru(input_)
# RETURNS: (
# output [torch.FloatTensor (6x3x20)],
# hidden_state [torch.FloatTensor (2x3x20)]
# )
(http://pytorchnlp.readthedocs.io/en/latest/source/torchnlp.text_encoders.html)
WhitespaceEncoder 在遇到空白字符时将文本分解为条目:
from torchnlp.text_encoders import WhitespaceEncoder
# Create a `WhitespaceEncoder` with a corpus of text
encoder = WhitespaceEncoder(["now this ain't funny", "so don't you dare laugh"])
# Encode and decode phrases
encoder.encode("this ain't funny.") # RETURNS: torch.LongTensor([6, 7, 1])
encoder.decode(encoder.encode("This ain't funny.")) # RETURNS: "this ain't funny."
(http://pytorchnlp.readthedocs.io/en/latest/source/torchnlp.word_to_vector.html)
from torchnlp.word_to_vector import FastText
vectors = FastText()
# Load vectors for any word as a `torch.FloatTensor`
vectors['hello'] # RETURNS: [torch.FloatTensor of size 100]
http://pytorchnlp.readthedocs.io/en/latest/source/torchnlp.metrics.html
最后,计算通用指标,如 BLEU 分数。
from torchnlp.metrics import get_moses_multi_bleu
hypotheses = ["The brown fox jumps over the dog 笑"]
references = ["The quick brown fox jumps over the lazy dog 笑"]
# Compute BLEU score with the official BLEU perl script
get_moses_multi_bleu(hypotheses, references, lowercase=True) # RETURNS: 47.9
4 月 AI 求职季
8 大明星企业
10 场分享盛宴
20 小时独门秘籍
4.10-4.19,我们准时相约!
新人福利
关注 AI 研习社(okweiwu),回复 1 领取
【超过 1000G 神经网络 / AI / 大数据资料】
新加坡国立大学霍华德:NLP 都有哪些有意思的事儿?
▼▼▼