动态基准理论</s> (A Theory of Dynamic Benchmarks)

Dynamic benchmarks interweave model fitting and data collection in an attempt to mitigate the limitations of static benchmarks. In contrast to an extensive theoretical and empirical study of the static setting, the dynamic counterpart lags behind due to limited empirical studies and no apparent theoretical foundation to date. Responding to this deficit, we initiate a theoretical study of dynamic benchmarking. We examine two realizations, one capturing current practice and the other modeling more complex settings. In the first model, where data collection and model fitting alternate sequentially, we prove that model performance improves initially but can stall after only three rounds. Label noise arising from, for instance, annotator disagreement leads to even stronger negative results. Our second model generalizes the first to the case where data collection and model fitting have a hierarchical dependency structure. We show that this design guarantees strictly more progress than the first, albeit at a significant increase in complexity. We support our theoretical analysis by simulating dynamic benchmarks on two popular datasets. These results illuminate the benefits and practical limitations of dynamic benchmarking, providing both a theoretical foundation and a causal explanation for observed bottlenecks in empirical work.

翻译：动态基准间接模型的安装和数据收集,以试图减轻静态基准的限制。与对静态环境的广泛理论和经验研究相比,动态对应方由于经验研究有限,迄今没有明显的理论基础而落后。针对这一缺陷,我们开始对动态基准进行理论研究。我们研究了两种认识,一种是当前做法,另一种是更复杂的模型。在第一个模型中,数据收集和模型依次交替,我们证明模型的性能最初有所改善,但只能在三轮后停顿下来。例如,由于通知人分歧而产生的拉贝尔噪音导致更强烈的负面结果。我们的第二个模型将第一个模型概括到数据收集和模型安装具有等级依赖性结构的情况。我们表明,这种设计保证的进展严格多于第一个模型,尽管复杂性大大增加。我们支持我们的理论分析,方法是模拟两个流行数据集的动态基准。这些结果说明了动态基准的效益和实际限制,为经验工作中观察到的瓶颈提供了理论基础和因果关系解释。</s>

相关内容

MoDELS

关注 44

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日