Network meta-analysis (NMA) of rare events has attracted little attention in the literature. Until recently, networks of interventions with rare events were analyzed using the inverse-variance NMA approach. However, when events are rare the normal approximation made by this model can be poor and effect estimates are potentially biased. Other methods for the synthesis of such data are the recent extension of the Mantel-Haenszel approach to NMA or the use of the non-central hypergeometric distribution. In this article, we suggest a new common-effect NMA approach that can be applied even in networks of interventions with extremely low or even zero number of events without requiring study exclusion or arbitrary imputations. Our method is based on the implementation of the penalized likelihood function proposed by Firth for bias reduction of the maximum likelihood estimate to the logistic expression of the NMA model. A limitation of our method is that heterogeneity cannot be taken into account as an additive parameter as in most meta-analytical models. However, we account for heterogeneity by incorporating a multiplicative overdispersion term using a two-stage approach. We show through simulations that our method performs consistently well across all tested scenarios and most often results in smaller bias than other available methods. We also illustrate the use of our method through two clinical examples. We conclude that our "penalized likelihood NMA" approach is promising for the analysis of binary outcomes with rare events especially for networks with very few studies per comparison and very low control group risks.
In recent years, local differential privacy (LDP) has emerged as a technique of choice for privacy-preserving data collection in several scenarios when the aggregator is not trustworthy. LDP provides client-side privacy by adding noise at the user's end. Thus, clients need not rely on the trustworthiness of the aggregator. In this work, we provide a noise-aware probabilistic modeling framework, which allows Bayesian inference to take into account the noise added for privacy under LDP, conditioned on locally perturbed observations. Stronger privacy protection (compared to the central model) provided by LDP protocols comes at a much harsher privacy-utility trade-off. Our framework tackles several computational and statistical challenges posed by LDP for accurate uncertainty quantification under Bayesian settings. We demonstrate the efficacy of our framework in parameter estimation for univariate and multi-variate distributions as well as logistic and linear regression.
We study regression discontinuity designs in which many covariates, possibly much more than the number of observations, are available. We provide a two-step algorithm which first selects the set of covariates to be used through a localized Lasso-type procedure, and then, in a second step, estimates the treatment effect by including the selected covariates into the usual local linear estimator. We provide an in-depth analysis of the algorithm's theoretical properties, showing that, under an approximate sparsity condition, the resulting estimator is asymptotically normal, with asymptotic bias and variance that are conceptually similar to those obtained in low-dimensional settings. Bandwidth selection and inference can be carried out using standard methods. We also provide simulations and an empirical application.
In this paper several related estimation problems are addressed from a Bayesian point of view and optimal estimators are obtained for each of them when some natural loss functions are considered. Namely, we are interested in estimating a regression curve. Simultaneously, the estimation problems of a conditional distribution function, or a conditional density, or even the conditional distribution itself, are considered. All these problems are posed in a sufficiently general framework to cover continuous and discrete, univariate and multivariate, parametric and non-parametric cases, without the need to use a specific prior distribution. The loss functions considered come naturally from the quadratic error loss function comonly used in estimating a real function of the unknown parameter. The cornerstone of the mentioned Bayes estimators is the posterior predictive distribution. Some examples are provided to illustrate these results.
Bayesian optimization (BO) is a powerful approach for optimizing black-box, expensive-to-evaluate functions. To enable a flexible trade-off between the cost and accuracy, many applications allow the function to be evaluated at different fidelities. In order to reduce the optimization cost while maximizing the benefit-cost ratio, in this paper, we propose Batch Multi-fidelity Bayesian Optimization with Deep Auto-Regressive Networks (BMBO-DARN). We use a set of Bayesian neural networks to construct a fully auto-regressive model, which is expressive enough to capture strong yet complex relationships across all the fidelities, so as to improve the surrogate learning and optimization performance. Furthermore, to enhance the quality and diversity of queries, we develop a simple yet efficient batch querying method, without any combinatorial search over the fidelities. We propose a batch acquisition function based on Max-value Entropy Search (MES) principle, which penalizes highly correlated queries and encourages diversity. We use posterior samples and moment matching to fulfill efficient computation of the acquisition function and conduct alternating optimization over every fidelity-input pair, which guarantees an improvement at each step. We demonstrate the advantage of our approach on four real-world hyperparameter optimization applications.
Understanding how treatment effects vary on individual characteristics is critical in the contexts of personalized medicine, personalized advertising and policy design. When the characteristics are of practical interest are only a subset of full covariate, non-parametric estimation is often desirable; but few methods are available due to the computational difficult. Existing non-parametric methods such as the inverse probability weighting methods have limitations that hinder their use in many practical settings where the values of propensity scores are close to 0 or 1. We propose the propensity score regression (PSR) that allows the non-parametric estimation of the heterogeneous treatment effects in a wide context. PSR includes two non-parametric regressions in turn, where it first regresses on the propensity scores together with the characteristics of interest, to obtain an intermediate estimate; and then, regress the intermediate estimates on the characteristics of interest only. By including propensity scores as regressors in the non-parametric manner, PSR is capable of substantially easing the computational difficulty while remain (locally) insensitive to any value of propensity scores. We present several appealing properties of PSR, including the consistency and asymptotical normality, and in particular the existence of an explicit variance estimator, from which the analytical behaviour of PSR and its precision can be assessed. Simulation studies indicate that PSR outperform existing methods in varying settings with extreme values of propensity scores. We apply our method to the national 2009 flu survey (NHFS) data to investigate the effects of seasonal influenza vaccination and having paid sick leave across different age groups.
Discrete data are abundant and often arise as counts or rounded data. However, even for linear regression models, conjugate priors and closed-form posteriors are typically unavailable, thereby necessitating approximations or Markov chain Monte Carlo for posterior inference. For a broad class of count and rounded data regression models, we introduce conjugate priors that enable closed-form posterior inference. Key posterior and predictive functionals are computable analytically or via direct Monte Carlo simulation. Crucially, the predictive distributions are discrete to match the support of the data and can be evaluated or simulated jointly across multiple covariate values. These tools are broadly useful for linear regression, nonlinear models via basis expansions, and model and variable selection. Multiple simulation studies demonstrate significant advantages in computing, predictive modeling, and selection relative to existing alternatives.
Longitudinal item response data are common in social science, educational science, and psychology, among other disciplines. Studying the time-varying relationships between items is crucial for educational assessment or designing marketing strategies from survey questions. Although dynamic network models have been widely developed, we cannot apply them directly to item response data because there are multiple systems of nodes with various types of local interactions among items, resulting in multiplex network structures. We propose a new model to study these temporal interactions among items by embedding the functional parameters within the exponential random graph model framework. Inference on such models is difficult because the likelihood functions contain intractable normalizing constants. Furthermore, the number of functional parameters grows exponentially as the number of items increases. Variable selection for such models is not trivial because standard shrinkage approaches do not consider temporal trends in functional parameters. To overcome these challenges, we develop a novel Bayes approach by combining an auxiliary variable MCMC algorithm and a recently-developed functional shrinkage method. We apply our algorithm to survey and review data sets, illustrating that the proposed approach can avoid the evaluation of intractable normalizing constants as well as the detection of significant temporal interactions among items. Through a simulation study under different scenarios, we examine the performance of our algorithm. Our method is, to our knowledge, the first attempt to select functional variables for models with intractable normalizing constants.
Active learning algorithms select a subset of data for annotation to maximize the model performance on a budget. One such algorithm is Expected Gradient Length, which as the name suggests uses the approximate gradient induced per example in the sampling process. While Expected Gradient Length has been successfully used for classification and regression, the formulation for regression remains intuitively driven. Hence, our theoretical contribution involves deriving this formulation, thereby supporting the experimental evidence. Subsequently, we show that expected gradient length in regression is equivalent to Bayesian uncertainty. If certain assumptions are infeasible, our algorithmic contribution (EGL++) approximates the effect of ensembles with a single deterministic network. Instead of computing multiple possible inferences per input, we leverage previously annotated samples to quantify the probability of previous labels being the true label. Such an approach allows us to extend expected gradient length to a new task: human pose estimation. We perform experimental validation on two human pose datasets (MPII and LSP/LSPET), highlighting the interpretability and competitiveness of EGL++ with different active learning algorithms for human pose estimation.
In this paper, we study the properties of nonparametric least squares regression using deep neural networks. We derive non-asymptotic upper bounds for the prediction error of the empirical risk minimizer for feedforward deep neural regression. Our error bounds achieve minimax optimal rate and significantly improve over the existing ones in the sense that they depend polynomially on the dimension of the predictor, instead of exponentially on dimension. We show that the neural regression estimator can circumvent the curse of dimensionality under the assumption that the predictor is supported on an approximate low-dimensional manifold or a set with low Minkowski dimension. These assumptions differ from the structural condition imposed on the target regression function and are weaker and more realistic than the exact low-dimensional manifold support assumption. We investigate how the prediction error of the neural regression estimator depends on the structure of neural networks and propose a notion of network relative efficiency between two types of neural networks, which provides a quantitative measure for evaluating the relative merits of different network structures. To establish these results, we derive a novel approximation error bound for the H\"older smooth functions with a positive smoothness index using ReLU activated neural networks, which may be of independent interest. Our results are derived under weaker assumptions on the data distribution and the neural network structure than those in the existing literature.
Heatmap-based methods dominate in the field of human pose estimation by modelling the output distribution through likelihood heatmaps. In contrast, regression-based methods are more efficient but suffer from inferior performance. In this work, we explore maximum likelihood estimation (MLE) to develop an efficient and effective regression-based methods. From the perspective of MLE, adopting different regression losses is making different assumptions about the output density function. A density function closer to the true distribution leads to a better regression performance. In light of this, we propose a novel regression paradigm with Residual Log-likelihood Estimation (RLE) to capture the underlying output distribution. Concretely, RLE learns the change of the distribution instead of the unreferenced underlying distribution to facilitate the training process. With the proposed reparameterization design, our method is compatible with off-the-shelf flow models. The proposed method is effective, efficient and flexible. We show its potential in various human pose estimation tasks with comprehensive experiments. Compared to the conventional regression paradigm, regression with RLE bring 12.4 mAP improvement on MSCOCO without any test-time overhead. Moreover, for the first time, especially on multi-person pose estimation, our regression method is superior to the heatmap-based methods. Our code is available at https://github.com/Jeff-sjtu/res-loglikelihood-regression