We consider a general task called partial Wasserstein covering with the goal of providing information on what patterns are not being taken into account in a dataset (e.g., dataset used during development) compared with another dataset(e.g., dataset obtained from actual applications). We model this task as a discrete optimization problem with partial Wasserstein divergence as an objective function. Although this problem is NP-hard, we prove that it satisfies the submodular property, allowing us to use a greedy algorithm with a 0.63 approximation. However, the greedy algorithm is still inefficient because it requires solving linear programming for each objective function evaluation. To overcome this inefficiency, we propose quasi-greedy algorithms that consist of a series of acceleration techniques, such as sensitivity analysis based on strong duality and the so-called C-transform in the optimal transport field. Experimentally, we demonstrate that we can efficiently fill in the gaps between the two datasets and find missing scene in real driving scenes datasets.
翻译:我们认为一项总任务,即部分瓦森斯坦(Wasserstein),目的是提供与另一数据集(例如,从实际应用中获得的数据集)相比,在数据集(例如,从实际应用中获得的数据集)中未考虑到哪些模式的信息(例如,在开发过程中使用的数据集),从而提供与哪些模式不考虑的信息。我们将此任务作为独立的优化问题模型,将部分瓦森斯坦(Wasserstein)作为一个客观的功能。虽然这个问题是非常规的,但我们证明它满足了亚模式属性,允许我们用0.63近似法使用贪婪的算法。然而,贪婪算法仍然效率低下,因为它需要解决每个目标函数评价的线性编程。为克服这种效率低下,我们建议了由一系列加速技术组成的准基因算法,例如基于强烈的双重性和最佳运输场中所谓的C-变异体的敏感性分析。我们实验性地证明,我们可以有效地填补两个数据集之间的空白,并在真实驱动场景数据集中找到缺失的场景。