Access-Redundancy Tradeoffs in Quantized Linear Computations

Linear real-valued computations over distributed datasets are common in many applications, most notably as part of machine learning inference. In particular, linear computations that are quantized, i.e., where the coefficients are restricted to a predetermined set of values (such as $\pm 1$), have gained increasing interest lately due to their role in efficient, robust, or private machine learning models. Given a dataset to store in a distributed system, we wish to encode it so that all such computations could be conducted by accessing a small number of servers, called the access parameter of the system. Doing so relieves the remaining servers to execute other tasks. Minimizing the access parameter gives rise to an access-redundancy tradeoff, where a smaller access parameter requires more redundancy in the system, and vice versa. In this paper, we study this tradeoff and provide several explicit low-access schemes for $\{\pm1\}$ quantized linear computations based on covering codes in a novel way. While the connection to covering codes has been observed in the past, our results strictly outperform the state-of-the-art for two-valued linear computations. We further show that the same storage scheme can be used to retrieve any linear combination with two distinct coefficients -- regardless of what those coefficients are -- with the same access parameter. This universality result is then extended to all possible quantizations with any number of values; while the storage remains identical, the access parameter increases according to a new additive-combinatorics property we call coefficient complexity. We then turn to study the coefficient complexity -- we characterize the complexity of small sets of coefficients, provide bounds, and identify coefficient sets having the highest and lowest complexity.

翻译：暂无翻译