** In paired design studies, it is common to have multiple measurements taken for the same set of subjects under different conditions. In observational studies, it is many times of interest to conduct pair matching on multiple covariates between a treatment group and a control group, and to test the treatment effect represented by multiple response variables on well pair-matched data. However, there is a lack of an effective test on multivariate paired data. The multivariate paired Hotelling's $T^2$ test can sometimes be used, but its power decreases fast as the dimension increases. Existing methods for assessing the balance of multiple covariates in matched observational studies usually ignore the paired structure and thus they do not perform well under some settings. In this work, we propose a new non-parametric test for paired data, which exhibits a substantial power improvement over existing methods under a wide range of situations. We also derive the asymptotic distribution of the new test and the approximate $p$-value is reasonably accurate under finite samples through simulation studies even when the dimension is larger than the sample size, making the new test an easy-off-the-shelf tool for real applications. The proposed test is illustrated through an analysis of a real data set on the Alzheimer's disease research. **

两人亲密社交应用，官网：
http://trypair.com/

** We consider the Bayesian analysis of models in which the unknown distribution of the outcomes is specified up to a set of conditional moment restrictions. The nonparametric exponentially tilted empirical likelihood function is constructed to satisfy a sequence of unconditional moments based on an increasing (in sample size) vector of approximating functions (such as tensor splines based on the splines of each conditioning variable). For any given sample size, results are robust to the number of expanded moments. We derive Bernstein-von Mises theorems for the behavior of the posterior distribution under both correct and incorrect specification of the conditional moments, subject to growth rate conditions (slower under misspecification) on the number of approximating functions. A large-sample theory for comparing different conditional moment models is also developed. The central result is that the marginal likelihood criterion selects the model that is less misspecified. We also introduce sparsity-based model search for high-dimensional conditioning variables, and provide efficient MCMC computations for high-dimensional parameters. Along with clarifying examples, the framework is illustrated with real-data applications to risk-factor determination in finance, and causal inference under conditional ignorability. **

** Constraint based causal structure learning for point processes require empirical tests of local independence. Existing tests require strong model assumptions, e.g. that the true data generating model is a Hawkes process with no latent confounders. Even when restricting attention to Hawkes processes, latent confounders are a major technical difficulty because a marginalized process will generally not be a Hawkes process itself. We introduce an expansion similar to Volterra expansions as a tool to represent marginalized intensities. Our main theoretical result is that such expansions can approximate the true marginalized intensity arbitrarily well. Based on this we propose a test of local independence and investigate its properties in real and simulated data. **

** In this paper we address the computational feasibility of the class of decision theoretic models referred to as adversarial risk analyses (ARA). These are models where a decision must be made with consideration for how an intelligent adversary may behave and where the decision-making process of the adversary is unknown, and is elicited by analyzing the adversary's decision problem using priors on his utility function and beliefs. The motivation of this research was to develop a computational algorithm that can be applied across a broad range of ARA models; to the best of our knowledge, no such algorithm currently exists. Using a two-person sequential model, we incrementally increase the size of the model and develop a simulation-based approximation of the true optimum where an exact solution is computationally impractical. In particular, we begin with a relatively large decision space by considering a theoretically continuous space that must be discretized. Then, we incrementally increase the number of strategic objectives which causes the decision space to grow exponentially. The problem is exacerbated by the presence of an intelligent adversary who also must solve an exponentially large decision problem according to some unknown decision-making process. Nevertheless, using a stylized example that can be solved analytically we show that our algorithm not only solves large ARA models quickly but also accurately selects to the true optimal solution. Furthermore, the algorithm is sufficiently general that it can be applied to any ARA model with a large, yet finite, decision space. **

** Semi-Supervised Learning (SSL) approaches have been an influential framework for the usage of unlabeled data when there is not a sufficient amount of labeled data available over the course of training. SSL methods based on Convolutional Neural Networks (CNNs) have recently provided successful results on standard benchmark tasks such as image classification. In this work, we consider the general setting of SSL problem where the labeled and unlabeled data come from the same underlying probability distribution. We propose a new approach that adopts an Optimal Transport (OT) technique serving as a metric of similarity between discrete empirical probability measures to provide pseudo-labels for the unlabeled data, which can then be used in conjunction with the initial labeled data to train the CNN model in an SSL manner. We have evaluated and compared our proposed method with state-of-the-art SSL algorithms on standard datasets to demonstrate the superiority and effectiveness of our SSL algorithm. **

** Recommender systems are central to modern online platforms, but a popular concern is that they may be pulling society in dangerous directions (e.g., towards filter bubbles). However, a challenge with measuring the effects of recommender systems is how to compare user outcomes under these systems to outcomes under a credible counterfactual world without such systems. We take a model-based approach to this challenge, introducing a dichotomy of process models that we can compare: (1) a "recommender" model describing a generic item-matching process under a personalized recommender system and (2) an "organic" model describing a baseline counterfactual where users search for items without the mediation of any system. Our key finding is that the recommender and organic models result in dramatically different outcomes at both the individual and societal level, as supported by theorems and simulation experiments with real data. The two process models also induce different trade-offs during inference, where standard performance-improving techniques such as regularization/shrinkage have divergent effects. Shrinkage improves the mean squared error of matches in both settings, as expected, but at the cost of less diverse (less radical) items chosen in the recommender model but more diverse (more radical) items chosen in the organic model. These findings provide a formal language for how recommender systems may be fundamentally altering how we search for and interact with content, in a world increasingly mediated by such systems. **

Combining Parametric and Nonparametric Models to Estimate Treatment Effects in Observational Studies

** Performing causal inference in observational studies requires we assume confounding variables are correctly adjusted for. G-computation methods are often used in these scenarios, with several recent proposals using Bayesian versions of g-computation. In settings with few confounders, standard models can be employed, however as the number of confounders increase these models become less feasible as there are fewer observations available for each unique combination of confounding variables. In this paper we propose a new model for estimating treatment effects in observational studies that incorporates both parametric and nonparametric outcome models. By conceptually splitting the data, we can combine these models while maintaining a conjugate framework, allowing us to avoid the use of MCMC methods. Approximations using the central limit theorem and random sampling allows our method to be scaled to high dimensional confounders while maintaining computational efficiency. We illustrate the model using carefully constructed simulation studies, as well as compare the computational costs to other benchmark models. **

** This paper focuses on the expected difference in borrower's repayment when there is a change in the lender's credit decisions. Classical estimators overlook the confounding effects and hence the estimation error can be magnificent. As such, we propose another approach to construct the estimators such that the error can be greatly reduced. The proposed estimators are shown to be unbiased, consistent, and robust through a combination of theoretical analysis and numerical testing. Moreover, we compare the power of estimating the causal quantities between the classical estimators and the proposed estimators. The comparison is tested across a wide range of models, including linear regression models, tree-based models, and neural network-based models, under different simulated datasets that exhibit different levels of causality, different degrees of nonlinearity, and different distributional properties. Most importantly, we apply our approaches to a large observational dataset provided by a global technology firm that operates in both the e-commerce and the lending business. We find that the relative reduction of estimation error is strikingly substantial if the causal effects are accounted for correctly. **