【导读】BTB是MIT的开源库,其全称是Bayesian Tuning and Bandits,是一个用于auto-tuning系统的一个简单、可扩展的后端系统。
Github链接:https://github.com/HDI-Project/BTB
文档主页: https://hdi-project.github.io/BTB/
系统要求
BTB在Python3.5,3.6,3.7上都可以使用。
安装
可以通过pip和source两种方法安装:
pip:
pip install baytune
source:
git clone git@github.com:HDI-Project/BTB.git
cd BTB
git checkout stable
make install
使用方法
Tuners
Tuners用于对于特定的机器学习算法,快速选择最佳超参数。
该类在btb.tuning.tuners 定义,一轮迭代原理如下:
tuner提出一组超参数
将该超参数应用于模型,并打分
将分数返回给tuner
在每一轮迭代,tuner 会使用已有信息,提出最有可能得到高分的超参数。
为了实例化一个 Tuner,我们需要一个 Tunable 类,和一组超参数
from btb.tuning import Tunable
from btb.tuning.tuners import GPTuner
from btb.tuning.hyperparams import IntHyperParam
hyperparams = {
'n_estimators': IntHyperParam(min=10, max=500),
'max_depth': IntHyperParam(min=10, max=500),
}
tunable = Tunable(hyperparams)
tuner = GPTuner(tunable)
然后在一个循环内实现下面三步:
选择一组超参数
> parameters = tuner.propose()
> parameters
{'n_estimators': 297, 'max_depth': 3}
将该超参数应用于模型,并打分
>>> model = RandomForestClassifier(**parameters)
>>> model.fit(X_train, y_train)
RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
max_depth=3, max_features='auto', max_leaf_nodes=None,
min_impurity_decrease=0.0, min_impurity_split=None,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, n_estimators=297, n_jobs=1,
oob_score=False, random_state=None, verbose=0,
warm_start=False)
>>> score = model.score(X_test, y_test)
>>> score
0.77
将分数返回给tuner
tuner.record(parameters, score)
Seletors
Seletor是tuners的组合,为了使用selector,我们为每个模型创建一个Tuner 和 Selector的实例,
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
from btb.selection import UCB1
from btb.tuning.hyperparams import FloatHyperParam
models = {
'RF': RandomForestClassifier,
'SVC': SVC
}
'RF', 'SVC']) selector = UCB1([
rf_hyperparams = {
'n_estimators': IntHyperParam(min=10, max=500),
'max_depth': IntHyperParam(min=3, max=20)
}
rf_tunable = Tunable(rf_hyperparams)
svc_hyperparams = {
'C': FloatHyperParam(min=0.01, max=10.0),
'gamma': FloatHyperParam(0.000000001, 0.0000001)
}
svc_tunable = Tunable(svc_hyperparams)
tuners = {
'RF': GPTuner(rf_tunable),
'SVC': GPTuner(svc_tunable)
}
然后我们在一个循环里迭代下列步骤:
将所有得分都传到selector,让它决定测试哪个模型
next_choice = selector.select({
'RF': tuners['RF'].scores,
'SVC': tuners['SVC'].scores
})
next_choice
'RF'
从指定tuner中获取一组新参数,创建新的模型实例
>>> parameters = tuners[next_choice].propose()
>>> parameters
{'n_estimators': 289, 'max_depth': 18}
>>> model = models[next_choice](**parameters)
评估模型,将得分传给tuner
>>> model.fit(X_train, y_train)
RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
max_depth=18, max_features='auto', max_leaf_nodes=None,
min_impurity_decrease=0.0, min_impurity_split=None,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, n_estimators=289, n_jobs=1,
oob_score=False, random_state=None, verbose=0,
warm_start=False)
>>> score = model.score(X_test, y_test)
>>> score
0.89
>>> tuners[next_choice].record(parameters, score)