In machine learning and optimization community there are two main approaches for convex risk minimization problem, namely, the Stochastic Approximation (SA) and the Sample Average Approximation (SAA). In terms of oracle complexity (required number of stochastic gradient evaluations), both approaches are considered equivalent on average (up to a logarithmic factor). The total complexity depends on the specific problem, however, starting from work \cite{nemirovski2009robust} it was generally accepted that the SA is better than the SAA. Nevertheless, in case of large-scale problems SA may run out of memory as storing all data on one machine and organizing online access to it can be impossible without communications with other machines. SAA in contradistinction to SA allows parallel/distributed calculations. In this paper, we shed new light on the comparison of SA and SAA for particular problem of calculating the population (regularized) Wasserstein barycenter of discrete measures. The conclusion is valid even for non-parallel (non-decentralized) setup.
翻译:在机器学习和优化社区,尽量减少风险问题有两种主要方法,即斯托切吸附(SA)和样本平均吸附(SAA),在分解复杂程度(要求的随机梯度评价数量)方面,这两种方法都被视为平均相等(直至对数系数),但总复杂程度取决于具体问题,但一般都认为,从工作开始,SA比SA要好。然而,在大规模问题出现时,SA可能失去记忆,因为将所有数据储存在一台机器上并组织在线访问它,而没有与其他机器的通信是不可能的。SAA与SA不同时,可以进行平行/分布计算。在本文中,我们对SA和SA的比较,就计算离散措施的人口(正规化的)瓦森斯坦温温中心的特殊问题作了新的说明。这一结论甚至对非平行(非分散化的)设置也有效。