The ever-growing size of the datasets renders well-studied learning techniques, such as Kernel Ridge Regression, inapplicable, posing a serious computational challenge. Divide-and-conquer is a common remedy, suggesting to split the dataset into disjoint partitions, obtain the local estimates and average them, it allows to scale-up an otherwise ineffective base approach. In the current study we suggest a fully data-driven approach to quantify uncertainty of the averaged estimator. Namely, we construct simultaneous element-wise confidence bands for the predictions yielded by the averaged estimator on a given deterministic prediction set. The novel approach features rigorous theoretical guaranties for a wide class of base learners with Kernel Ridge regression being a special case. As a by-product of our analysis we also obtain a sup-norm consistency result for the divide-and-conquer Kernel Ridge Regression. The simulation study supports the theoretical findings.
翻译:数据集的日益扩大规模使得人们研究得力的学习技术,如Kernel Ridge Regrestition,无法适用,构成了严重的计算挑战。分而治之是一种常见的补救办法,它建议将数据集分割成脱节的分区,获得当地的估计和平均数据,从而可以扩大一个本来无效的基础方法。在本研究报告中,我们建议采用完全以数据为驱动的方法来量化平均估量器的不确定性。也就是说,我们为平均估测器在特定确定性预测集上作出的预测,同时构建了元素错觉信任带。新颖的方法为大量基础学习者提供了严格的理论保证,而Kernel Ridge回归是一个特殊案例。作为我们分析的副产品,我们还为分而治Kernel Ridge Regrestition的分化和控制器的偏差取得一个高度一致的结果。模拟研究为理论结论提供了支持。