In this paper, we present and illustrate some new tools for rigorously analyzing training data selection methods. These tools focus on the information theoretic losses that occur when sampling data. We use this framework to prove that two methods, Facility Location Selection and Transductive Experimental Design, reduce these losses. These are meant to act as generalizable theoretical examples of applying the field of Information Theoretic Deep Learning Theory to the fields of data selection and active learning. Both analyses yield insight into their respective methods and increase their interpretability. In the case of Transductive Experimental Design, the provided analysis greatly increases the method's scope as well.
翻译:在本文中,我们介绍并举例说明了严格分析培训数据选择方法的一些新工具,这些工具侧重于抽样数据时出现的信息理论损失,我们利用这一框架来证明设施位置选择和传导实验设计这两个方法减少了这些损失,目的是作为将信息理论深层学习理论领域应用于数据选择和积极学习领域的一般理论实例,两种分析都有助于深入了解各自的方法并增加其可解释性。在转导实验设计方面,所提供的分析也极大地扩大了该方法的范围。