While theoretically appealing, the application of the Wasserstein distance to large-scale machine learning problems has been hampered by its prohibitive computational cost. The sliced Wasserstein distance and its variants improve the computational efficiency through the random projection, yet they suffer from low accuracy if the number of projections is not sufficiently large, because the majority of projections result in trivially small values. In this work, we propose a new family of distance metrics, called augmented sliced Wasserstein distances (ASWDs), constructed by first mapping samples to higher-dimensional hypersurfaces parameterized by neural networks. It is derived from a key observation that (random) linear projections of samples residing on these hypersurfaces would translate to much more flexible nonlinear projections in the original sample space, so they can capture complex structures of the data distribution. We show that the hypersurfaces can be optimized by gradient ascent efficiently. We provide the condition under which the ASWD is a valid metric and show that this can be obtained by an injective neural network architecture. Numerical results demonstrate that the ASWD significantly outperforms other Wasserstein variants for both synthetic and real-world problems.
翻译:虽然在理论上具有吸引力,但瓦森斯坦距离对于大规模机器学习问题的应用却因其令人望而生畏的计算成本而受到了阻碍。切片瓦森斯坦距离及其变体通过随机预测提高了计算效率,但如果预测数量不够大,则其精确度较低,因为大多数预测都产生微不足道的数值。在这项工作中,我们提议建立一个新的距离度量组,称为扩大切片瓦森斯坦距离(ASWDs),由首次测绘样品到由神经网络参数测量的较高维度超表层。它源于一项关键观察,即对居住在这些超表层的样品的(随机)线性预测将转化为更灵活的原样空间的非线性预测,以便它们能够捕捉到数据分布的复杂结构。我们表明,高表层可以通过梯度有效优化。我们提供了一个条件,即ASWD是一种有效的衡量标准,并表明可以通过直射神经网络结构获得这一参数。数字结果表明,ASWD明显超出合成和实际世界问题的其他瓦列斯特斯坦变体。