Differential Granger causality, that is understanding how Granger causal relations differ between two related time series, is of interest in many scientific applications. Modeling each time series by a vector autoregressive (VAR) model, we propose a new method to directly learn the difference between the corresponding transition matrices in high dimensions. Key to the new method is an estimating equation constructed based on the Yule-Walker equation that links the difference in transition matrices to the difference in the corresponding precision matrices. In contrast to separately estimating each transition matrix and then calculating the difference, the proposed direct estimation method only requires sparsity of the difference of the two VAR models, and hence allows hub nodes in each high-dimensional time series. The direct estimator is shown to be consistent in estimation and support recovery under mild assumptions. These results also lead to novel consistency results with potentially faster convergence rates for estimating differences between precision matrices of i.i.d observations under weaker assumptions than existing results. We evaluate the finite sample performance of the proposed method using simulation studies and an application to electroencephalogram (EEG) data.
Causal models are notoriously difficult to validate because they make untestable assumptions regarding confounding. New scientific experiments offer the possibility of evaluating causal models using prediction performance. Prediction performance measures are typically robust to violations in causal assumptions. However, prediction performance does depend on the selection of training and test sets. Biased training sets can lead to optimistic assessments of model performance. In this work, we revisit the prediction performance of several recently proposed causal models tested on a genetic perturbation data set of Kemmeren. We find that sample selection bias is likely a key driver of model performance. We propose using a less-biased evaluation set for assessing prediction performance and compare models on this new set. In this setting, the causal models have similar or worse performance compared to standard association-based estimators such as Lasso. Finally, we compare the performance of causal estimators in simulation studies that reproduce the Kemmeren structure of genetic knockout experiments but without any sample selection bias. These results provide an improved understanding of the performance of several causal models and offer guidance on how future studies should use Kemmeren.
Modern data collecting methods and computation tools have made it possible to monitor high-dimensional processes. In this article, Phase II monitoring of high-dimensional processes is investigated when the available number of samples collected in Phase I is limitted in comparison to the number of variables. A new charting statistic for high-dimensional multivariate processes based on the diagonal elements of the underlying covariance matrix is introduced and a unified procedure for Phase I and II by employing a self-starting control chart is proposed. To remedy the effect of outliers, we adopt a robust procedure for parameter estimation in Phase I and introduce the appropriate consistent estimators. The statistical performance of the proposed method is evaluated in Phase II through average run length (ARL) criterion in the absence and presence of outliers and reveals that the proposed control chart scheme effectively detects various kinds of shifts in the process mean. Finally, we illustrate the applicability of our proposed method via a real-world example.
The goal of many scientific experiments including A/B testing is to estimate the average treatment effect (ATE), which is defined as the difference between the expected outcomes of two or more treatments. In this paper, we consider a situation where an experimenter can assign a treatment to research subjects sequentially. In adaptive experimental design, the experimenter is allowed to change the probability of assigning a treatment using past observations for estimating the ATE efficiently. However, with this approach, it is difficult to apply a standard statistical method to construct an estimator because the observations are not independent and identically distributed. We thus propose an algorithm for efficient experiments with estimators constructed from dependent samples. We also introduce a sequential testing framework using the proposed estimator. To justify our proposed approach, we provide finite and infinite sample analyses. Finally, we experimentally show that the proposed algorithm exhibits preferable performance.
We consider the Bayesian analysis of models in which the unknown distribution of the outcomes is specified up to a set of conditional moment restrictions. The nonparametric exponentially tilted empirical likelihood function is constructed to satisfy a sequence of unconditional moments based on an increasing (in sample size) vector of approximating functions (such as tensor splines based on the splines of each conditioning variable). For any given sample size, results are robust to the number of expanded moments. We derive Bernstein-von Mises theorems for the behavior of the posterior distribution under both correct and incorrect specification of the conditional moments, subject to growth rate conditions (slower under misspecification) on the number of approximating functions. A large-sample theory for comparing different conditional moment models is also developed. The central result is that the marginal likelihood criterion selects the model that is less misspecified. We also introduce sparsity-based model search for high-dimensional conditioning variables, and provide efficient MCMC computations for high-dimensional parameters. Along with clarifying examples, the framework is illustrated with real-data applications to risk-factor determination in finance, and causal inference under conditional ignorability.
Numerical computing the rank of a matrix is a fundamental problem in scientific computation. The data sets generated by Internet often correspond to the analysis of high-dimensional sparse matrices. Notwithstanding the recent advances in the promotion of traditional singular value decomposition (SVD), an efficient estimation algorithm for rank of a high-dimensional sparse matrix is still lacked. Inspired by the controllability theory of complex networks, we converted the rank of a matrix into max-matching computing. Then we established a fast rank estimation algorithm by using cavity method, a powerful approximate technique for computing the max-matching, to estimate the rank of a sparse matrix. In the merit of its natural low complexity of cavity method, we showed that the rank of a high-dimensional sparse matrix can be estimated in a much faster way than SVD with high accuracy. Our method offers an efficient pathway to fast estimate the rank of the high-dimensional sparse matrix, when the time cost of computing the rank by SVD is unacceptable.
It is crucially important to estimate unknown parameters in earth system models by integrating observation and numerical simulation. For many applications in earth system sciences, the optimization method which allows parameters to temporally change is required. Here I present an efficient and practical method to estimate the time-varying parameters of relatively low dimensional models. I propose combining offline batch optimization and online data assimilation. In the newly proposed method, called Hybrid Offline Online Parameter Estimation with Particle Filtering (HOOPE-PF), I constrain the estimated model parameters in sequential data assimilation to the result of offline batch optimization in which the posterior distribution of model parameters is obtained by comparing the simulated and observed climatology. The HOOPE-PF outperforms the original sampling-importance-resampling particle filter in the synthetic experiment with the toy model and the real-data experiment with the conceptual hydrological model. The advantage of HOOPE-PF is that the performance of the online data assimilation is not greatly affected by the hyperparameter of ensemble data assimilation which contributes to inflating the ensemble variance of estimated parameters.
Many real-world optimization problems involve uncertain parameters with probability distributions that can be estimated using contextual feature information. In contrast to the standard approach of first estimating the distribution of uncertain parameters and then optimizing the objective based on the estimation, we propose an integrated conditional estimation-optimization (ICEO) framework that estimates the underlying conditional distribution of the random parameter while considering the structure of the optimization problem. We directly model the relationship between the conditional distribution of the random parameter and the contextual features, and then estimate the probabilistic model with an objective that aligns with the downstream optimization problem. We show that our ICEO approach is asymptotically consistent under moderate regularity conditions and further provide finite performance guarantees in the form of generalization bounds. Computationally, performing estimation with the ICEO approach is a non-convex and often non-differentiable optimization problem. We propose a general methodology for approximating the potentially non-differentiable mapping from estimated conditional distribution to the optimal decision by a differentiable function, which greatly improves the performance of gradient-based algorithms applied to the non-convex problem. We also provide a polynomial optimization solution approach in the semi-algebraic case. Numerical experiments are also conducted to show the empirical success of our approach in different situations including with limited data samples and model mismatches.
We study the role of interactivity in distributed statistical inference under information constraints, e.g., communication constraints and local differential privacy. We focus on the tasks of goodness-of-fit testing and estimation of discrete distributions. From prior work, these tasks are well understood under noninteractive protocols. Extending these approaches directly for interactive protocols is difficult due to correlations that can build due to interactivity; in fact, gaps can be found in prior claims of tight bounds of distribution estimation using interactive protocols. We propose a new approach to handle this correlation and establish a unified method to establish lower bounds for both tasks. As an application, we obtain optimal bounds for both estimation and testing under local differential privacy and communication constraints. We also provide an example of a natural testing problem where interactivity helps.
Ordinary differential equation (ODE) is widely used in modeling biological and physical processes in science. In this article, we propose a new reproducing kernel-based approach for estimation and inference of ODE given noisy observations. We do not assume the functional forms in ODE to be known, or restrict them to be linear or additive, and we allow pairwise interactions. We perform sparse estimation to select individual functionals, and construct confidence intervals for the estimated signal trajectories. We establish the estimation optimality and selection consistency of kernel ODE under both the low-dimensional and high-dimensional settings, where the number of unknown functionals can be smaller or larger than the sample size. Our proposal builds upon the smoothing spline analysis of variance (SS-ANOVA) framework, but tackles several important problems that are not yet fully addressed, and thus extends the scope of existing SS-ANOVA too. We demonstrate the efficacy of our method through numerous ODE examples.
Implicit probabilistic models are models defined naturally in terms of a sampling procedure and often induces a likelihood function that cannot be expressed explicitly. We develop a simple method for estimating parameters in implicit models that does not require knowledge of the form of the likelihood function or any derived quantities, but can be shown to be equivalent to maximizing likelihood under some conditions. Our result holds in the non-asymptotic parametric setting, where both the capacity of the model and the number of data examples are finite. We also demonstrate encouraging experimental results.