Scalable and efficient distributed learning is one of the main driving forces behind the recent rapid advancement of machine learning and artificial intelligence. One prominent feature of this topic is that recent progresses have been made by researchers in two communities: (1) the system community such as database, data management, and distributed systems, and (2) the machine learning and mathematical optimization community. The interaction and knowledge sharing between these two communities has led to the rapid development of new distributed learning systems and theory. In this work, we hope to provide a brief introduction of some distributed learning techniques that have recently been developed, namely lossy communication compression (e.g., quantization and sparsification), asynchronous communication, and decentralized communication. One special focus in this work is on making sure that it can be easily understood by researchers in both communities -- On the system side, we rely on a simplified system model hiding many system details that are not necessary for the intuition behind the system speedups; while, on the theory side, we rely on minimal assumptions and significantly simplify the proof of some recent work to achieve comparable results.
翻译:最近机器学习和人工智能的迅速发展是主要驱动力之一,可扩展和高效分布式学习是最近机器学习和人工智能迅速发展的主要动力之一。这个专题的一个突出特点是,两个社区的研究人员最近取得了进展:(1) 数据库、数据管理和分布式系统等系统社区,(2) 机器学习和数学优化社区。这两个社区之间的互动和知识共享导致新的分布式学习系统和理论的迅速发展。在这项工作中,我们希望简要介绍一些最近开发的分散式学习技术,即丢失的通信压缩(例如夸大和封闭)、不同步的通信和分散式通信。这项工作的一个特别重点是确保两个社区的研究人员能够容易地理解这个系统 -- -- 在系统方面,我们依靠一个简化的系统模型来隐藏许多系统细节,而这些细节对于系统超速的直觉是不必要的。在理论方面,我们依靠最低限度的假设和大量简化最近工作的证据来取得可比的结果。