机器之心报道
参与:一鸣、张倩
模型太多往往也是个问题,特别是开发者需要逐个学习每个模型的使用方法。最近,RTI International 公司的数据科学家们开发了一个统一的语言模型库 gobbli。用户可以像使用 Keras 那样直接上手文本分类任务,还有很多知名的语言模型可以选择,如 BERT 等。
项目地址:https://github.com/RTIInternational/gobbli
使用文档:https://gobbli.readthedocs.io/en/latest/quickstart.html
from gobbli.experiment import ClassificationExperiment
from gobbli.model import MajorityClassifier
X = ["This is positive.","This is negative.","This is bad.","This is good.","This is really bad.","This is really good.","This is pretty good.","This is pretty bad.",]
y = ["Good","Bad","Bad","Good","Bad","Good","Good","Bad",]
exp = ClassificationExperiment(model_cls=MajorityClassifier,dataset=(X, y))
results = exp.run()
from gobbli.io import TrainInput
train_input = TrainInput(
# X_train: A list of strings to classify
X_train=["This is a training document.","This is another training document."],
# y_train: The true class for each string in X_train
y_train=["0", "1"],
# And likewise for validation
X_valid=["This is a validation sentence.","This is another validation sentence."],
y_valid=["1","0"],
# Number of documents to train on at once
train_batch_size=1,
# Number of documents to evaluate at once
valid_batch_size=1,
# Number of times to iterate over the training set
num_train_epochs=1)
from gobbli.model import MajorityClassifier
clf = MajorityClassifier()# Set up classifier resources -- Docker image, etc.
clf.build()
train_output = clf.train*(train_input)
from gobbli.io import PredictInput
predict_input = PredictInput(
# X: A list of strings to predict the trained classes for
X=["Which class is this document?"],
# Pass the set of labels and the trained checkpoint
# from the training output
labels=train_output.labels,
checkpoint=train_output.checkpoint,
# Number of documents to predict at once
predict_batch_size=1)
predict_output = clf.predict(predict_input)
pip install gobbli