We investigate the problem of active learning in the streaming setting in non-parametric regimes, where the labels are stochastically generated from a class of functions on which we make no assumptions whatsoever. We rely on recently proposed Neural Tangent Kernel (NTK) approximation tools to construct a suitable neural embedding that determines the feature space the algorithm operates on and the learned model computed atop. Since the shape of the label requesting threshold is tightly related to the complexity of the function to be learned, which is a-priori unknown, we also derive a version of the algorithm which is agnostic to any prior knowledge. This algorithm relies on a regret balancing scheme to solve the resulting online model selection problem, and is computationally efficient. We prove joint guarantees on the cumulative regret and number of requested labels which depend on the complexity of the labeling function at hand. In the linear case, these guarantees recover known minimax results of the generalization error as a function of the label complexity in a standard statistical learning setting.
翻译:我们调查了在非参数体系中流进设置中的积极学习问题,在非参数体系中,标签是从我们没有任何假设的某类功能中随机生成的。我们依靠最近提出的神经唐氏内尔(NTK)近似工具来构建一个合适的神经嵌入器,以决定算法所操作的特征空间和计算到的模型。由于标签要求阈值的形状与需要学习的功能的复杂性紧密相关,这是最先未知的,因此我们还得出了一个对先前的任何知识具有不可知性的算法版本。这种算法依靠一种遗憾平衡法来解决由此产生的在线模型选择问题,并且具有计算效率。我们证明对累积的遗憾和请求标签数量的共同保证,这些标签取决于手头标签功能的复杂性。在线性案例中,这些保证恢复了在标准统计学习环境中作为标签复杂性函数的通用错误的已知微质量结果。