With the vigorous development of artificial intelligence technology, various engineering technology applications have been implemented one after another. The gradient descent method plays an important role in solving various optimization problems, due to its simple structure, good stability and easy implementation. In multi-node machine learning system, the gradients usually need to be shared. Data reconstruction attacks can reconstruct training data simply by knowing the gradient information. In this paper, to prevent gradient leakage while keeping the accuracy of model, we propose the super stochastic gradient descent approach to update parameters by concealing the modulus length of gradient vectors and converting it or them into a unit vector. Furthermore, we analyze the security of stochastic gradient descent approach. Experiment results show that our approach is obviously superior to prevalent gradient descent approaches in terms of accuracy and robustness.

### 相关内容

This paper investigates robust recovery of an undamped or damped spectrally sparse signal from its partially revealed noisy entries within the framework of spectral compressed sensing. Nonconvex optimization approaches such as projected gradient descent (PGD) based on low-rank Hankel matrix completion model have recently been proposed for this problem. However, the analysis of PGD relies heavily on the operation of projection onto feasible set involving two tuning parameters, and the theoretical guarantee in noisy case is still missing. In this paper, we propose a vanilla gradient descent (VGD) algorithm without projection based on low-rank Hankel noisy matrix completion, and prove that VGD can achieve the sample complexity $O(K^2\log^2 N)$, where $K$ is the number of the complex exponential functions and $N$ is the signal dimensions, to ensure robust recovery from noisy observations when noise parameter satisfies some mild conditions. Moreover, we show the possible performance loss of PGD, suffering from the inevitable estimation of the above two unknown parameters of feasible set. Numerical simulations are provided to corroborate our analysis and show more stable performance obtained by VGD than PGD when dealing with damped spectrally sparse signal.

Within the context of hybrid quantum-classical optimization, gradient descent based optimizers typically require the evaluation of expectation values with respect to the outcome of parameterized quantum circuits. In this work, we explore the consequences of the prior observation that estimation of these quantities on quantum hardware results in a form of stochastic gradient descent optimization. We formalize this notion, which allows us to show that in many relevant cases, including VQE, QAOA and certain quantum classifiers, estimating expectation values with $k$ measurement outcomes results in optimization algorithms whose convergence properties can be rigorously well understood, for any value of $k$. In fact, even using single measurement outcomes for the estimation of expectation values is sufficient. Moreover, in many settings the required gradients can be expressed as linear combinations of expectation values -- originating, e.g., from a sum over local terms of a Hamiltonian, a parameter shift rule, or a sum over data-set instances -- and we show that in these cases $k$-shot expectation value estimation can be combined with sampling over terms of the linear combination, to obtain "doubly stochastic" gradient descent optimizers. For all algorithms we prove convergence guarantees, providing a framework for the derivation of rigorous optimization results in the context of near-term quantum devices. Additionally, we explore numerically these methods on benchmark VQE, QAOA and quantum-enhanced machine learning tasks and show that treating the stochastic settings as hyper-parameters allows for state-of-the-art results with significantly fewer circuit executions and measurements.

Rapid developments in streaming data technologies are continuing to generate increased interest in monitoring human activity. Wearable devices, such as wrist-worn sensors that monitor gross motor activity (actigraphy), have become prevalent. An actigraph unit continually records the activity level of an individual, producing a very large amount of data at a high-resolution that can be immediately downloaded and analyzed. While this kind of \textit{big data} includes both spatial and temporal information, the variation in such data seems to be more appropriately modeled by considering stochastic evolution through time while accounting for spatial information separately. We propose a comprehensive Bayesian hierarchical modeling and inferential framework for actigraphy data reckoning with the massive sizes of such databases while attempting to offer full inference. Building upon recent developments in this field, we construct Nearest Neighbour Gaussian Processes (NNGPs) for actigraphy data to compute at large temporal scales. More specifically, we construct a temporal NNGP and we focus on the optimized implementation of the collapsed algorithm in this specific context. This approach permits improved model scaling while also offering full inference. We test and validate our methods on simulated data and subsequently apply and verify their predictive ability on an original dataset concerning a health study conducted by the Fielding School of Public Health of the University of California, Los Angeles.

Black-box problems are common in real life like structural design, drug experiments, and machine learning. When optimizing black-box systems, decision-makers always consider multiple performances and give the final decision by comprehensive evaluations. Motivated by such practical needs, we focus on constrained black-box problems where the objective and constraints lack known special structure, and evaluations are expensive and even with noise. We develop a novel constrained Bayesian optimization approach based on the knowledge gradient method ($c-\rm{KG}$). A new acquisition function is proposed to determine the next batch of samples considering optimality and feasibility. An unbiased estimator of the gradient of the new acquisition function is derived to implement the $c-\rm{KG}$ approach.

The need for fast and robust optimization algorithms are of critical importance in all areas of machine learning. This paper treats the task of designing optimization algorithms as an optimal control problem. Using regret as a metric for an algorithm's performance, we study the existence, uniqueness and consistency of regret-optimal algorithms. By providing first-order optimality conditions for the control problem, we show that regret-optimal algorithms must satisfy a specific structure in their dynamics which we show is equivalent to performing dual-preconditioned gradient descent on the value function generated by its regret. Using these optimal dynamics, we provide bounds on their rates of convergence to solutions of convex optimization problems. Though closed-form optimal dynamics cannot be obtained in general, we present fast numerical methods for approximating them, generating optimization algorithms which directly optimize their long-term regret. Lastly, these are benchmarked against commonly used optimization algorithms to demonstrate their effectiveness.

We study the problem of training deep neural networks with Rectified Linear Unit (ReLU) activiation function using gradient descent and stochastic gradient descent. In particular, we study the binary classification problem and show that for a broad family of loss functions, with proper random weight initialization, both gradient descent and stochastic gradient descent can find the global minima of the training loss for an over-parameterized deep ReLU network, under mild assumption on the training data. The key idea of our proof is that Gaussian random initialization followed by (stochastic) gradient descent produces a sequence of iterates that stay inside a small perturbation region centering around the initial weights, in which the empirical loss function of deep ReLU networks enjoys nice local curvature properties that ensure the global convergence of (stochastic) gradient descent. Our theoretical results shed light on understanding the optimization of deep learning, and pave the way to study the optimization dynamics of training modern deep neural networks.

We present a meta-learning approach for adaptive text-to-speech (TTS) with few data. During training, we learn a multi-speaker model using a shared conditional WaveNet core and independent learned embeddings for each speaker. The aim of training is not to produce a neural network with fixed weights, which is then deployed as a TTS system. Instead, the aim is to produce a network that requires few data at deployment time to rapidly adapt to new speakers. We introduce and benchmark three strategies: (i) learning the speaker embedding while keeping the WaveNet core fixed, (ii) fine-tuning the entire architecture with stochastic gradient descent, and (iii) predicting the speaker embedding with a trained neural network encoder. The experiments show that these approaches are successful at adapting the multi-speaker neural network to new speakers, obtaining state-of-the-art results in both sample naturalness and voice similarity with merely a few minutes of audio data from new speakers.

We propose accelerated randomized coordinate descent algorithms for stochastic optimization and online learning. Our algorithms have significantly less per-iteration complexity than the known accelerated gradient algorithms. The proposed algorithms for online learning have better regret performance than the known randomized online coordinate descent algorithms. Furthermore, the proposed algorithms for stochastic optimization exhibit as good convergence rates as the best known randomized coordinate descent algorithms. We also show simulation results to demonstrate performance of the proposed algorithms.

The field of Multi-Agent System (MAS) is an active area of research within Artificial Intelligence, with an increasingly important impact in industrial and other real-world applications. Within a MAS, autonomous agents interact to pursue personal interests and/or to achieve common objectives. Distributed Constraint Optimization Problems (DCOPs) have emerged as one of the prominent agent architectures to govern the agents' autonomous behavior, where both algorithms and communication models are driven by the structure of the specific problem. During the last decade, several extensions to the DCOP model have enabled them to support MAS in complex, real-time, and uncertain environments. This survey aims at providing an overview of the DCOP model, giving a classification of its multiple extensions and addressing both resolution methods and applications that find a natural mapping within each class of DCOPs. The proposed classification suggests several future perspectives for DCOP extensions, and identifies challenges in the design of efficient resolution algorithms, possibly through the adaptation of strategies from different areas.

Are we using the right potential functions in the Conditional Random Field models that are popular in the Vision community? Semantic segmentation and other pixel-level labelling tasks have made significant progress recently due to the deep learning paradigm. However, most state-of-the-art structured prediction methods also include a random field model with a hand-crafted Gaussian potential to model spatial priors, label consistencies and feature-based image conditioning. In this paper, we challenge this view by developing a new inference and learning framework which can learn pairwise CRF potentials restricted only by their dependence on the image pixel values and the size of the support. Both standard spatial and high-dimensional bilateral kernels are considered. Our framework is based on the observation that CRF inference can be achieved via projected gradient descent and consequently, can easily be integrated in deep neural networks to allow for end-to-end training. It is empirically demonstrated that such learned potentials can improve segmentation accuracy and that certain label class interactions are indeed better modelled by a non-Gaussian potential. In addition, we compare our inference method to the commonly used mean-field algorithm. Our framework is evaluated on several public benchmarks for semantic segmentation with improved performance compared to previous state-of-the-art CNN+CRF models.

Xunmeng Wu,Zai Yang,Zongben Xu
0+阅读 · 1月21日
Ryan Sweke,Frederik Wilde,Johannes Meyer,Maria Schuld,Paul K. Faehrmann,Barthélémy Meynard-Piganeau,Jens Eisert
0+阅读 · 1月20日
Pierfrancesco Alaimo Di Loro,Marco Mingione,Jonah Lipsitt,Christina M. Batteate,Michael Jerrett,Sudipto Banerjee
0+阅读 · 1月20日
Wenjie Chen,Shengcai Liu,Ke Tang
0+阅读 · 1月20日
Philippe Casgrain,Anastasis Kratsios
0+阅读 · 1月19日
Difan Zou,Yuan Cao,Dongruo Zhou,Quanquan Gu
7+阅读 · 2018年11月21日
Yutian Chen,Yannis Assael,Brendan Shillingford,David Budden,Scott Reed,Heiga Zen,Quan Wang,Luis C. Cobo,Andrew Trask,Ben Laurie,Caglar Gulcehre,Aäron van den Oord,Oriol Vinyals,Nando de Freitas
7+阅读 · 2018年9月27日
Ferdinando Fioretto,Enrico Pontelli,William Yeoh
4+阅读 · 2018年1月11日
Måns Larsson,Anurag Arnab,Fredrik Kahl,Shuai Zheng,Philip Torr
3+阅读 · 2018年1月2日

110+阅读 · 2020年11月26日

30+阅读 · 2020年10月31日

43+阅读 · 2020年7月26日

67+阅读 · 2020年5月15日

107+阅读 · 2020年2月1日

74+阅读 · 2019年10月11日

5+阅读 · 2019年8月19日
CreateAMind
12+阅读 · 2019年5月22日
CreateAMind
7+阅读 · 2019年1月7日
CreateAMind
32+阅读 · 2019年1月3日
CreateAMind
9+阅读 · 2019年1月2日
CreateAMind
4+阅读 · 2018年12月28日

9+阅读 · 2017年9月24日

3+阅读 · 2017年8月6日
CreateAMind
5+阅读 · 2017年8月4日
Top