We provide a thorough treatment of hyperparameter optimisation for three data descriptors with a good track-record in the literature: Support Vector Machine (SVM), Nearest Neighbour Distance (NND) and Average Localised Proximity (ALP). The hyperparameters of SVM have to be optimised through cross-validation, while NND and ALP allow the reuse of a single nearest-neighbour query and an efficient form of leave-one-out validation. We experimentally evaluate the effect of hyperparameter optimisation with 246 classification problems drawn from 50 datasets. From a selection of optimisation algorithms, the recent Malherbe-Powell proposal optimises the hyperparameters of all three data descriptors most efficiently. We calculate the increase in test AUROC and the amount of overfitting as a function of the number of hyperparameter evaluations. After 50 evaluations, ALP and SVM both significantly outperform NND. The performance of ALP and SVM is comparable, but ALP can be optimised more efficiently, while a choice between ALP and SVM based on validation AUROC gives the best overall result. This distils the many variables of one-class classification with hyperparameter optimisation down to a clear choice with a known trade-off, allowing practitioners to make informed decisions.
翻译:我们为文献中记录良好轨迹的3个数据描述器提供超光度优化的彻底处理:支持矢量机(SVM)、近邻距离(NND)和平均本地化近距离(ALP)。 SVM的超光度参数必须通过交叉校验加以优化,而NND和ALP允许重新使用一个近邻查询和一种有效的请假一次性验证形式。我们试验性地评估超光度优化的效果,从50个数据集中提取了246个分类问题。从选择优化算法的选择中,最近的Malherbe-Powell建议以最有效率的方式将所有3个数据描述器的超光度参数加以优化。我们计算AUROC测试的增加和过度匹配作为超光度评估数量的函数。经过50次评价后,ALP和SVM两者的性能明显超出NND。ALP和SVM的性能是可比的,但ALP可以更高效地加以选择,同时使ALP和SVM的整个贸易分类结果成为最清晰的变量。