Given a set of dissimilarity measurements amongst data points, determining what metric representation is most "consistent" with the input measurements or the metric that best captures the relevant geometric features of the data is a key step in many machine learning algorithms. Existing methods are restricted to specific kinds of metrics or small problem sizes because of the large number of metric constraints in such problems. In this paper, we provide an active set algorithm, Project and Forget, that uses Bregman projections, to solve metric constrained problems with many (possibly exponentially) inequality constraints. We provide a theoretical analysis of \textsc{Project and Forget} and prove that our algorithm converges to the global optimal solution and that the $L_2$ distance of the current iterate to the optimal solution decays asymptotically at an exponential rate. We demonstrate that using our method we can solve large problem instances of three types of metric constrained problems: general weight correlation clustering, metric nearness, and metric learning; in each case, out-performing the state of the art methods with respect to CPU times and problem sizes.
翻译:鉴于各数据点之间一系列不同程度的测量,确定什么计量代表最“一致”于输入测量或最能捕捉数据相关几何特征的计量,是许多机器学习算法的关键步骤。现有方法限于特定类型的计量或小问题大小,因为这类问题存在大量计量限制。在本文件中,我们提供一套主动的算法,即项目和忘记,使用布雷格曼预测,解决多种(可能指数化)不平等制约的衡量受限问题。我们提供了对\ textsc{Project and forget}的理论分析,并证明我们的算法与全球最佳解决方案相融合,目前最优解决方案的距离值为2美元,指数速度会逐渐衰减。我们证明,我们用我们的方法可以解决三大类型的三大问题:一般重量相关组合、计量接近和计量学习;在每一种情况下,都超越了CPU时间和问题大小方面的艺术方法状况。