Deep learning on large-scale data is dominant nowadays. The unprecedented scale of data has been arguably one of the most important driving forces for the success of deep learning. However, there still exist scenarios where collecting data or labels could be extremely expensive, e.g., medical imaging and robotics. To fill up this gap, this paper considers the problem of data-efficient learning from scratch using a small amount of representative data. First, we characterize this problem by active learning on homeomorphic tubes of spherical manifolds. This naturally generates feasible hypothesis class. With homologous topological properties, we identify an important connection -- finding tube manifolds is equivalent to minimizing hyperspherical energy (MHE) in physical geometry. Inspired by this connection, we propose a MHE-based active learning (MHEAL) algorithm, and provide comprehensive theoretical guarantees for MHEAL, covering convergence and generalization analysis. Finally, we demonstrate the empirical performance of MHEAL in a wide range of applications on data-efficient learning, including deep clustering, distribution matching, version space sampling and deep active learning.
翻译:大量数据方面的深层学习是当今占主导地位的。前所未有的数据规模可以说是深层学习取得成功的最重要动力之一。然而,目前仍然存在着收集数据或标签可能极其昂贵的情景,例如医学成像和机器人。为填补这一空白,本文件考虑利用少量具有代表性的数据从零开始从零开始进行数据效率学习的问题。首先,我们通过积极学习球形元体的软体管来说明这一问题。这自然产生了可行的假设等级。我们发现,由于同质的地貌特性,我们发现了一种重要的联系 -- -- 找到管形体等同于尽量减少物理几何学中的超球能。受此联系的启发,我们建议采用以MHE为主的积极学习(MHEAAL)算法,并为MHEAAL提供全面的理论保障,包括趋同和概括分析。最后,我们展示了MHEAL在数据效率学习的广泛应用中的经验性表现,包括深度组合、分布匹配、版本空间取样和深层积极学习。