In large-scale classification problems, the data set always be faced with frequent updates when a part of the data is added to or removed from the original data set. In this case, conventional incremental learning, which updates an existing classifier by explicitly modeling the data modification, is more efficient than retraining a new classifier from scratch. However, sometimes, we are more interested in determining whether we should update the classifier or performing some sensitivity analysis tasks. To deal with these such tasks, we propose an algorithm to make rational inferences about the updated linear classifier without exactly updating the classifier. Specifically, the proposed algorithm can be used to estimate the upper and lower bounds of the updated classifier's coefficient matrix with a low computational complexity related to the size of the updated dataset. Both theoretical analysis and experiment results show that the proposed approach is superior to existing methods in terms of tightness of coefficients' bounds and computational complexity.
翻译:在大规模分类问题中,当数据的一部分被添加到原始数据集或从原始数据集中删除时,数据集总是面临频繁更新的问题。在这种情况下,常规的递增学习(通过对数据修改进行明确的建模来对现有分类器进行更新)比从零开始重新培训新的分类器更有效。然而,有时,我们更有兴趣确定是更新分类器还是执行某种敏感性分析任务。为了处理这些任务,我们建议一种算法,在不确切更新分类器的情况下,对更新的线性分类器作出合理的推论。具体地说,拟议的算法可以用来估计更新的分类器系数矩阵的上限和下限,与更新数据集的大小有关的低计算复杂性。理论分析和实验结果都表明,拟议的方法优于现有方法,即系数界限的紧紧性和计算复杂性。