Non-linear hierarchical models are commonly used in many disciplines. However, inference in the presence of non-nested effects and on large datasets is challenging and computationally burdensome. This paper provides two contributions to scalable and accurate inference. First, I derive a new mean-field variational algorithm for estimating binomial logistic hierarchical models with an arbitrary number of non-nested random effects. Second, I propose "marginally augmented variational Bayes" (MAVB) that further improves the initial approximation through a step of Bayesian post-processing. I prove that MAVB provides a guaranteed improvement in the approximation quality at low computational cost and induces dependencies that were assumed away by the initial factorization assumptions. I apply these techniques to a study of voter behavior using a high-dimensional application of the popular approach of multilevel regression and post-stratification (MRP). Existing estimation took hours whereas the algorithms proposed run in minutes. The posterior means are well-recovered even under strong factorization assumptions. Applying MAVB further improves the approximation by partially correcting the under-estimated variance. The proposed methodology is implemented in an open source software package.

### 相关内容

Structural equation models are commonly used to capture the relationship between sets of observed and unobservable variables. Traditionally these models are fitted using frequentist approaches but recently researchers and practitioners have developed increasing interest in Bayesian inference. In Bayesian settings, inference for these models is typically performed via Markov chain Monte Carlo methods, which may be computationally intensive for models with a large number of manifest variables or complex structures. Variational approximations can be a fast alternative; however, they have not been adequately explored for this class of models. We develop a mean field variational Bayes approach for fitting elemental structural equation models and demonstrate how bootstrap can considerably improve the variational approximation quality. We show that this variational approximation method can provide reliable inference while being significantly faster than Markov chain Monte Carlo.

We consider the problem of minimizing a convex function that is evolving according to unknown and possibly stochastic dynamics, which may depend jointly on time and on the decision variable itself. Such problems abound in the machine learning and signal processing literature, under the names of concept drift, stochastic tracking, and performative prediction. We provide novel non-asymptotic convergence guarantees for stochastic algorithms with iterate averaging, focusing on bounds valid both in expectation and with high probability. The efficiency estimates we obtain clearly decouple the contributions of optimization error, gradient noise, and time drift. Notably, we show that the tracking efficiency of the proximal stochastic gradient method depends only logarithmically on the initialization quality, when equipped with a step-decay schedule. Numerical experiments illustrate our results.

Statistical models are central to machine learning with broad applicability across a range of downstream tasks. The models are typically controlled by free parameters that are estimated from data by maximum-likelihood estimation. However, when faced with real-world datasets many of the models run into a critical issue: they are formulated in terms of fully-observed data, whereas in practice the datasets are plagued with missing data. The theory of statistical model estimation from incomplete data is conceptually similar to the estimation of latent-variable models, where powerful tools such as variational inference (VI) exist. However, in contrast to standard latent-variable models, parameter estimation with incomplete data often requires estimating exponentially-many conditional distributions of the missing variables, hence making standard VI methods intractable. We address this gap by introducing variational Gibbs inference (VGI), a new general-purpose method to estimate the parameters of statistical models from incomplete data. We validate VGI on a set of synthetic and real-world estimation tasks, estimating important machine learning models, VAEs and normalising flows, from incomplete data. The proposed method, whilst general-purpose, achieves competitive or better performance than existing model-specific estimation methods.

Approximate inference methods like the Laplace method, Laplace approximations and variational methods, amongst others, are popular methods when exact inference is not feasible due to the complexity of the model or the abundance of data. In this paper we propose a hybrid approximate method namely Low-Rank Variational Bayes correction (VBC), that uses the Laplace method and subsequently a Variational Bayes correction to the posterior mean. The cost is essentially that of the Laplace method which ensures scalability of the method. We illustrate the method and its advantages with simulated and real data, on small and large scale.

This paper develops simple feed-forward neural networks that achieve the universal approximation property for all continuous functions with a fixed finite number of neurons. These neural networks are simple because they are designed with a simple and computable continuous activation function $\sigma$ leveraging a triangular-wave function and a softsign function. We prove that $\sigma$-activated networks with width $36d(2d+1)$ and depth $11$ can approximate any continuous function on a $d$-dimensioanl hypercube within an arbitrarily small error. Hence, for supervised learning and its related regression problems, the hypothesis space generated by these networks with a size not smaller than $36d(2d+1)\times 11$ is dense in the space of continuous functions. Furthermore, classification functions arising from image and signal classification are in the hypothesis space generated by $\sigma$-activated networks with width $36d(2d+1)$ and depth $12$, when there exist pairwise disjoint closed bounded subsets of $\mathbb{R}^d$ such that the samples of the same class are located in the same subset.

Estimating nested expectations is an important task in computational mathematics and statistics. In this paper we propose a new Monte Carlo method using post-stratification to estimate nested expectations efficiently without taking samples of the inner random variable from the conditional distribution given the outer random variable. This property provides the advantage over many existing methods that it enables us to estimate nested expectations only with a dataset on the pair of the inner and outer variables drawn from the joint distribution. We show an upper bound on the mean squared error of the proposed method under some assumptions. Numerical experiments are conducted to compare our proposed method with several existing methods (nested Monte Carlo method, multilevel Monte Carlo method, and regression-based method), and we see that our proposed method is superior to the compared methods in terms of efficiency and applicability.

Influence maximization is the task of selecting a small number of seed nodes in a social network to maximize the spread of the influence from these seeds, and it has been widely investigated in the past two decades. In the canonical setting, the whole social network as well as its diffusion parameters is given as input. In this paper, we consider the more realistic sampling setting where the network is unknown and we only have a set of passively observed cascades that record the set of activated nodes at each diffusion step. We study the task of influence maximization from these cascade samples (IMS), and present constant approximation algorithms for this task under mild conditions on the seed set distribution. To achieve the optimization goal, we also provide a novel solution to the network inference problem, that is, learning diffusion parameters and the network structure from the cascade data. Comparing with prior solutions, our network inference algorithm requires weaker assumptions and does not rely on maximum-likelihood estimation and convex programming. Our IMS algorithms enhance the learning-and-then-optimization approach by allowing a constant approximation ratio even when the diffusion parameters are hard to learn, and we do not need any assumption related to the network structure or diffusion parameters.

The Bayesian paradigm has the potential to solve core issues of deep neural networks such as poor calibration and data inefficiency. Alas, scaling Bayesian inference to large weight spaces often requires restrictive approximations. In this work, we show that it suffices to perform inference over a small subset of model weights in order to obtain accurate predictive posteriors. The other weights are kept as point estimates. This subnetwork inference framework enables us to use expressive, otherwise intractable, posterior approximations over such subsets. In particular, we implement subnetwork linearized Laplace: We first obtain a MAP estimate of all weights and then infer a full-covariance Gaussian posterior over a subnetwork. We propose a subnetwork selection strategy that aims to maximally preserve the model's predictive uncertainty. Empirically, our approach is effective compared to ensembles and less expressive posterior approximations over full networks.

Owing to the recent advances in "Big Data" modeling and prediction tasks, variational Bayesian estimation has gained popularity due to their ability to provide exact solutions to approximate posteriors. One key technique for approximate inference is stochastic variational inference (SVI). SVI poses variational inference as a stochastic optimization problem and solves it iteratively using noisy gradient estimates. It aims to handle massive data for predictive and classification tasks by applying complex Bayesian models that have observed as well as latent variables. This paper aims to decentralize it allowing parallel computation, secure learning and robustness benefits. We use Alternating Direction Method of Multipliers in a top-down setting to develop a distributed SVI algorithm such that independent learners running inference algorithms only require sharing the estimated model parameters instead of their private datasets. Our work extends the distributed SVI-ADMM algorithm that we first propose, to an ADMM-based networked SVI algorithm in which not only are the learners working distributively but they share information according to rules of a graph by which they form a network. This kind of work lies under the umbrella of `deep learning over networks' and we verify our algorithm for a topic-modeling problem for corpus of Wikipedia articles. We illustrate the results on latent Dirichlet allocation (LDA) topic model in large document classification, compare performance with the centralized algorithm, and use numerical experiments to corroborate the analytical results.

Amortized inference has led to efficient approximate inference for large datasets. The quality of posterior inference is largely determined by two factors: a) the ability of the variational distribution to model the true posterior and b) the capacity of the recognition network to generalize inference over all datapoints. We analyze approximate inference in variational autoencoders in terms of these factors. We find that suboptimal inference is often due to amortizing inference rather than the limited complexity of the approximating distribution. We show that this is due partly to the generator learning to accommodate the choice of approximation. Furthermore, we show that the parameters used to increase the expressiveness of the approximation play a role in generalizing inference rather than simply improving the complexity of the approximation.

Khue-Dung Dang,Luca Maestrini
0+阅读 · 11月26日
Joshua Cutler,Dmitriy Drusvyatskiy,Zaid Harchaoui
0+阅读 · 11月25日
Vaidotas Simkus,Benjamin Rhodes,Michael U. Gutmann
0+阅读 · 11月25日
Janet van Niekerk,Haavard Rue
0+阅读 · 11月25日
Zuowei Shen,Haizhao Yang,Shijun Zhang
0+阅读 · 11月25日
Tomohiko Hironaka,Takashi Goda
0+阅读 · 11月24日
Wei Chen,Xiaoming Sun,Jialin Zhang,Zhijie Zhang
4+阅读 · 6月7日
Erik Daxberger,Eric Nalisnick,James Urquhart Allingham,Javier Antorán,José Miguel Hernández-Lobato
9+阅读 · 2月18日
Hamza Anwar,Quanyan Zhu
3+阅读 · 2018年2月27日
Chris Cremer,Xuechen Li,David Duvenaud
3+阅读 · 2018年1月10日

5+阅读 · 2019年6月28日
CreateAMind
8+阅读 · 2019年5月18日

10+阅读 · 2018年12月24日
CreateAMind
3+阅读 · 2018年4月15日

24+阅读 · 2017年11月16日

3+阅读 · 2017年8月6日
CreateAMind
5+阅读 · 2017年8月4日
Top