We propose a new decentralized average consensus algorithm with compressed communication that scales linearly with the network size n. We prove that the proposed method converges to the average of the initial values held locally by the agents of a network when agents are allowed to communicate with compressed messages. The proposed algorithm works for a broad class of compression operators (possibly biased), where agents interact over arbitrary static, undirected, and connected networks. We further present numerical experiments that confirm our theoretical results and illustrate the scalability and communication efficiency of our algorithm.

### 相关内容

IFIP TC13 Conference on Human-Computer Interaction是人机交互领域的研究者和实践者展示其工作的重要平台。多年来，这些会议吸引了来自几个国家和文化的研究人员。官网链接：http://interact2019.org/

The goal of many scientific experiments including A/B testing is to estimate the average treatment effect (ATE), which is defined as the difference between the expected outcomes of two or more treatments. In this paper, we consider a situation where an experimenter can assign a treatment to research subjects sequentially. In adaptive experimental design, the experimenter is allowed to change the probability of assigning a treatment using past observations for estimating the ATE efficiently. However, with this approach, it is difficult to apply a standard statistical method to construct an estimator because the observations are not independent and identically distributed. We thus propose an algorithm for efficient experiments with estimators constructed from dependent samples. We also introduce a sequential testing framework using the proposed estimator. To justify our proposed approach, we provide finite and infinite sample analyses. Finally, we experimentally show that the proposed algorithm exhibits preferable performance.

The stochastic multi-armed bandit (MAB) problem is a common model for sequential decision problems. In the standard setup, a decision maker has to choose at every instant between several competing arms, each of them provides a scalar random variable, referred to as a "reward." Nearly all research on this topic considers the total cumulative reward as the criterion of interest. This work focuses on other natural objectives that cannot be cast as a sum over rewards, but rather more involved functions of the reward stream. Unlike the case of cumulative criteria, in the problems we study here the oracle policy, that knows the problem parameters a priori and is used to "center" the regret, is not trivial. We provide a systematic approach to such problems, and derive general conditions under which the oracle policy is sufficiently tractable to facilitate the design of optimism-based (upper confidence bound) learning policies. These conditions elucidate an interesting interplay between the arm reward distributions and the performance metric. Our main findings are illustrated for several commonly used objectives such as conditional value-at-risk, mean-variance trade-offs, Sharpe-ratio, and more.

We propose \texttt{FedGLOMO}, a novel federated learning (FL) algorithm with an iteration complexity of $\mathcal{O}(\epsilon^{-1.5})$ to converge to an $\epsilon$-stationary point (i.e., $\mathbb{E}[\|\nabla f(\bm{x})\|^2] \leq \epsilon$) for smooth non-convex functions -- under arbitrary client heterogeneity and compressed communication -- compared to the $\mathcal{O}(\epsilon^{-2})$ complexity of most prior works. Our key algorithmic idea that enables achieving this improved complexity is based on the observation that the convergence in FL is hampered by two sources of high variance: (i) the global server aggregation step with multiple local updates, exacerbated by client heterogeneity, and (ii) the noise of the local client-level stochastic gradients. By modeling the server aggregation step as a generalized gradient-type update, we propose a variance-reducing momentum-based global update at the server, which when applied in conjunction with variance-reduced local updates at the clients, enables \texttt{FedGLOMO} to enjoy an improved convergence rate. Moreover, we derive our results under a novel and more realistic client-heterogeneity assumption which we verify empirically -- unlike prior assumptions that are hard to verify. Our experiments illustrate the intrinsic variance reduction effect of \texttt{FedGLOMO}, which implicitly suppresses client-drift in heterogeneous data distribution settings and promotes communication efficiency.

Reconstruction of signals from undersampled and noisy measurements is a topic of considerable interest. Sharpness conditions directly control the recovery performance of restart schemes for first-order methods without the need for restrictive assumptions such as strong convexity. However, they are challenging to apply in the presence of noise or approximate model classes (e.g., approximate sparsity). We provide a first-order method: Weighted, Accelerated and Restarted Primal-dual (WARPd), based on primal-dual iterations and a novel restart-reweight scheme. Under a generic approximate sharpness condition, WARPd achieves stable linear convergence to the desired vector. Many problems of interest fit into this framework. For example, we analyze sparse recovery in compressed sensing, low-rank matrix recovery, matrix completion, TV regularization, minimization of $\|Bx\|_{l^1}$ under constraints ($l^1$-analysis problems for general $B$), and mixed regularization problems. We show how several quantities controlling recovery performance also provide explicit approximate sharpness constants. Numerical experiments show that WARPd compares favorably with specialized state-of-the-art methods and is ideally suited for solving large-scale problems. We also present a noise-blind variant based on the Square-Root LASSO decoder. Finally, we show how to unroll WARPd as neural networks. This approximation theory result provides lower bounds for stable and accurate neural networks for inverse problems and sheds light on architecture choices. Code and a gallery of examples are made available online as a MATLAB package.

Personalized Federated Learning (PFL) has recently seen tremendous progress, allowing the design of novel machine learning applications to preserve the privacy of the training data. Existing theoretical results in this field mainly focus on distributed optimization for minimization problems. This paper is the first to study PFL for saddle point problems (which cover a broader class of optimization problems), allowing for a more rich class of applications requiring more than just solving minimization problems. In this work, we consider a recently proposed PFL setting with the mixing objective function, an approach combining the learning of a global model together with locally distributed learners. Unlike most previous work, which considered only the centralized setting, we work in a more general and decentralized setup that allows us to design and analyze more practical and federated ways to connect devices to the network. We proposed new algorithms to address this problem and provide a theoretical analysis of the smooth (strongly-)convex-(strongly-)concave saddle point problems in stochastic and deterministic cases. Numerical experiments for bilinear problems and neural networks with adversarial noise demonstrate the effectiveness of the proposed methods.

To leverage massive distributed data and computation resources, machine learning in the network edge is considered to be a promising technique especially for large-scale model training. Federated learning (FL), as a paradigm of collaborative learning techniques, has obtained increasing research attention with the benefits of communication efficiency and improved data privacy. Due to the lossy communication channels and limited communication resources (e.g., bandwidth and power), it is of interest to investigate fast responding and accurate FL schemes over wireless systems. Hence, we investigate the problem of jointly optimized communication efficiency and resources for FL over wireless Internet of things (IoT) networks. To reduce complexity, we divide the overall optimization problem into two sub-problems, i.e., the client scheduling problem and the resource allocation problem. To reduce the communication costs for FL in wireless IoT networks, a new client scheduling policy is proposed by reusing stale local model parameters. To maximize successful information exchange over networks, a Lagrange multiplier method is first leveraged by decoupling variables including power variables, bandwidth variables and transmission indicators. Then a linear-search based power and bandwidth allocation method is developed. Given appropriate hyper-parameters, we show that the proposed communication-efficient federated learning (CEFL) framework converges at a strong linear rate. Through extensive experiments, it is revealed that the proposed CEFL framework substantially boosts both the communication efficiency and learning performance of both training loss and test accuracy for FL over wireless IoT networks compared to a basic FL approach with uniform resource allocation.

In this paper, we investigate the anti-jamming problem of a directional modulation (DM) system with the aid of intelligent reflecting surface (IRS). As an efficient tool to combat malicious jamming, receive beamforming (RBF) is usually designed to be on null-space of jamming channel or covariance matrix from Mallory to Bob. Thus, it is very necessary to estimate the receive jamming covariance matrix (JCM) at Bob. To achieve a precise JCM estimate, three JCM estimation methods, including eigenvalue decomposition (EVD), parametric estimation method by gradient descend (PEM-GD) and parametric estimation method by alternating optimization (PEM-AO), are proposed. Here, the proposed EVD is under rank-2 constraint of JCM. The PEM-GD method fully explores the structure features of JCM and the PEM-AO is to decrease the computational complexity of the former via dimensionality reduction. The simulation results show that in low and medium jamming-noise ratio (JNR) regions, the proposed three methods perform better than the existing sample covariance matrix method. The proposed PEM-GD and PEM-AO outperform EVD method and existing clutter and disturbance covariance estimator RCML.

We introduce a code generator that converts unoptimized C++ code operating on sparse data into vectorized and parallel CPU or GPU kernels. Our approach unrolls the computation into a massive expression graph, performs redundant expression elimination, grouping, and then generates an architecture-specific kernel to solve the same problem, assuming that the sparsity pattern is fixed, which is a common scenario in many applications in computer graphics and scientific computing. We show that our approach scales to large problems and can achieve speedups of two orders of magnitude on CPUs and three orders of magnitude on GPUs, compared to a set of manually optimized CPU baselines. To demonstrate the practical applicability of our approach, we employ it to optimize popular algorithms with applications to physical simulation and interactive mesh deformation.

We present and analyze a momentum-based gradient method for training linear classifiers with an exponentially-tailed loss (e.g., the exponential or logistic loss), which maximizes the classification margin on separable data at a rate of $\widetilde{\mathcal{O}}(1/t^2)$. This contrasts with a rate of $\mathcal{O}(1/\log(t))$ for standard gradient descent, and $\mathcal{O}(1/t)$ for normalized gradient descent. This momentum-based method is derived via the convex dual of the maximum-margin problem, and specifically by applying Nesterov acceleration to this dual, which manages to result in a simple and intuitive method in the primal. This dual view can also be used to derive a stochastic variant, which performs adaptive non-uniform sampling via the dual variables.

In this work, we consider the distributed optimization of non-smooth convex functions using a network of computing units. We investigate this problem under two regularity assumptions: (1) the Lipschitz continuity of the global objective function, and (2) the Lipschitz continuity of local individual functions. Under the local regularity assumption, we provide the first optimal first-order decentralized algorithm called multi-step primal-dual (MSPD) and its corresponding optimal convergence rate. A notable aspect of this result is that, for non-smooth functions, while the dominant term of the error is in $O(1/\sqrt{t})$, the structure of the communication network only impacts a second-order term in $O(1/t)$, where $t$ is time. In other words, the error due to limits in communication resources decreases at a fast rate even in the case of non-strongly-convex objective functions. Under the global regularity assumption, we provide a simple yet efficient algorithm called distributed randomized smoothing (DRS) based on a local smoothing of the objective function, and show that DRS is within a $d^{1/4}$ multiplicative factor of the optimal convergence rate, where $d$ is the underlying dimension.

Masahiro Kato,Takuya Ishihara,Junya Honda,Yusuke Narita
0+阅读 · 10月26日
Asaf Cassel,Shie Mannor,Assaf Zeevi
0+阅读 · 10月26日
Rudrajit Das,Anish Acharya,Abolfazl Hashemi,Sujay Sanghavi,Inderjit S. Dhillon,Ufuk Topcu
0+阅读 · 10月24日
0+阅读 · 10月24日
Hao Chen,Shaocheng Huang,Deyou Zhang,Ming Xiao,Mikael Skoglund,H. Vincent Poor
0+阅读 · 10月22日
Hangjia He,Ting Su,Hongjun Wang,Yin Teng,Weiping Shi,Feng Shu,Jiangzhou Wang
0+阅读 · 10月22日
Philipp Herholz,Xuan Tang,Teseo Schneider,Shoaib Kamil,Daniele Panozzo,Olga Sorkine-Hornung
0+阅读 · 10月15日
Ziwei Ji,Nathan Srebro,Matus Telgarsky
4+阅读 · 7月1日
Kevin Scaman,Francis Bach,Sébastien Bubeck,Yin Tat Lee,Laurent Massoulié
7+阅读 · 2018年6月1日

8+阅读 · 2019年7月10日
CreateAMind
12+阅读 · 2019年5月22日
CreateAMind
8+阅读 · 2019年5月18日
CreateAMind
6+阅读 · 2019年1月18日
CreateAMind
7+阅读 · 2019年1月7日
CreateAMind
29+阅读 · 2019年1月3日
CreateAMind
7+阅读 · 2018年12月10日
CreateAMind
3+阅读 · 2018年4月15日

3+阅读 · 2017年8月6日
CreateAMind
9+阅读 · 2017年7月21日
Top