In order to mitigate the high communication cost in distributed and federated learning, various vector compression schemes, such as quantization, sparsification and dithering, have become very popular. In designing a compression method, one aims to communicate as few bits as possible, which minimizes the cost per communication round, while at the same time attempting to impart as little distortion (variance) to the communicated messages as possible, which minimizes the adverse effect of the compression on the overall number of communication rounds. However, intuitively, these two goals are fundamentally in conflict: the more compression we allow, the more distorted the messages become. We formalize this intuition and prove an {\em uncertainty principle} for randomized compression operators, thus quantifying this limitation mathematically, and {\em effectively providing asymptotically tight lower bounds on what might be achievable with communication compression}. Motivated by these developments, we call for the search for the optimal compression operator. In an attempt to take a first step in this direction, we consider an unbiased compression method inspired by the Kashin representation of vectors, which we call {\em Kashin compression (KC)}. In contrast to all previously proposed compression mechanisms, KC enjoys a {\em dimension independent} variance bound for which we derive an explicit formula even in the regime when only a few bits need to be communicate per each vector entry.
翻译:为了降低分布式和联结式学习中的高通信成本,各种矢量压缩计划,如量化、垃圾化和抖动等,已经变得非常流行。在设计压缩方法时,人们的目标是尽可能多地沟通几小点,从而尽可能降低每轮通信的成本,同时试图尽可能少地对传递的信息进行扭曲(变化),从而尽可能减少压缩对通信周期总数的不利影响,从而尽可能减少压缩对通信回合总数的不利影响。然而,直觉地说,这两个目标从根本上处于冲突之中:我们允许的压缩越多,电文就越变扭曲。我们正式确定这种直觉,并证明是随机压缩操作者的一种 ~em 不确定原则}, 从而用数学量化这一限制, 并且有效地为通信压缩可能实现的目标提供尽可能小的更窄的界限。受这些事态发展的驱动,我们要求寻找最佳压缩操作器。为了朝着这个方向迈出第一步,我们认为一种由卡申式表达矢量代表的公平压缩方法,我们称之为 ~卡什里压缩(KC), 当我们提出一个独立的系统时,只需要一个完整的输入一个完整的格式时, 。