This paper addresses the problem of multiclass classification with corrupted or noisy bandit feedback. In this setting, the learner may not receive true feedback. Instead, it receives feedback that has been flipped with some non-zero probability. We propose a novel approach to deal with noisy bandit feedback based on the unbiased estimator technique. We further offer a method that can efficiently estimate the noise rates, thus providing an end-to-end framework. The proposed algorithm enjoys a mistake bound of the order of $O(\sqrt{T})$ in the high noise case and of the order of $O(T^{\nicefrac{2}{3}})$ in the worst case. We show our approach's effectiveness using extensive experiments on several benchmark datasets.
翻译:本文用腐败或吵闹的土匪反馈处理多级分类问题。 在此环境下, 学习者可能得不到真正的反馈。 相反, 学习者会收到一些非零概率的反馈。 我们提出一种新的方法, 以公正的估测器技术为基础, 处理吵闹的土匪反馈。 我们还提供了一种方法, 能够有效地估计噪音率, 从而提供一个端到端的框架。 提议的算法在高噪声案例和最坏案例的O( T ⁇ nicefrac{2 ⁇ 3 ⁇ 3 ⁇ 3 ⁇ 3美元)的排序中, 受到$O( sqrt{T}) 的错误约束。 我们用对几个基准数据集的广泛实验来显示我们的方法的有效性 。