We introduce a (de)-regularization of the Maximum Mean Discrepancy (DrMMD) and its Wasserstein gradient flow. Existing gradient flows that transport samples from source distribution to target distribution with only target samples, either lack tractable numerical implementation ($f$-divergence flows) or require strong assumptions, and modifications such as noise injection, to ensure convergence (Maximum Mean Discrepancy flows). In contrast, DrMMD flow can simultaneously (i) guarantee near-global convergence for a broad class of targets in both continuous and discrete time, and (ii) be implemented in closed form using only samples. The former is achieved by leveraging the connection between the DrMMD and the $χ^2$-divergence, while the latter comes by treating DrMMD as MMD with a de-regularized kernel. Our numerical scheme uses an adaptive de-regularization schedule throughout the flow to optimally trade off between discretization errors and deviations from the $χ^2$ regime. The potential application of the DrMMD flow is demonstrated across several numerical experiments, including a large-scale setting of training student/teacher networks.
翻译:本文提出了一种(去)正则化的最大均值差异(DrMMD)及其Wasserstein梯度流。现有的仅利用目标样本将源分布样本传输至目标分布的梯度流方法,要么缺乏可处理的数值实现方案($f$-散度流),要么需要强假设及噪声注入等修正手段才能保证收敛性(最大均值差异流)。相比之下,DrMMD流能够同时实现:(i) 在连续与离散时间下对广泛的目标分布类别保证近全局收敛;(ii) 仅依赖样本即可通过闭式解实现。前者通过建立DrMMD与$χ^2$-散度之间的关联实现,后者则通过将DrMMD视为采用去正则化核的MMD来处理。我们的数值方案在整个流过程中采用自适应去正则化调度,以在离散化误差与偏离$χ^2$区域的程度之间实现最优权衡。通过包括大规模学生/教师网络训练在内的多个数值实验,展示了DrMMD流的潜在应用价值。