We investigate the fundamental optimization question of minimizing a target function $f(x)$ whose gradients are expensive to compute or have limited availability, given access to some auxiliary side function $h(x)$ whose gradients are cheap or more available. This formulation captures many settings of practical relevance such as i) re-using batches in SGD, ii) transfer learning, iii) federated learning, iv) training with compressed models/dropout, etc. We propose two generic new algorithms which are applicable in all these settings and prove using only an assumption on the Hessian similarity between the target and side information that we can benefit from this framework.
翻译:我们调查了尽量减少目标函数$f(x)这一基本优化问题,因为其梯度对于计算成本昂贵或可用性有限,而某些辅助函数的梯度低廉或更易获得的,其梯度为$h(x)美元,这一公式包含许多实际相关的环境,例如:一) 重新使用SGD中的批次,二) 转移学习,三) 联合学习,四) 使用压缩模型/脱机等培训。我们提议了两种通用的新算法,所有这些环境都适用,并证明仅仅使用了对目标与能够受益于这一框架的海珊相近性的假设。