Various goodness-of-fit tests are designed based on the so-called information matrix equivalence: if the assumed model is correctly specified, two information matrices that are derived from the likelihood function are equivalent. In the literature, this principle has been established for the likelihood function with fully observed data, but it has not been verified under the likelihood for censored data. In this manuscript, we prove the information matrix equivalence in the framework of semiparametric copula models for multivariate censored survival data. Based on this equivalence, we propose an information ratio (IR) test for the specification of the copula function. The IR statistic is constructed via comparing consistent estimates of the two information matrices. We derive the asymptotic distribution of the IR statistic and propose a parametric bootstrap procedure for the finite-sample $P$-value calculation. The performance of the IR test is investigated via a simulation study and a real data example.

### 相关内容

《计算机信息》杂志发表高质量的论文，扩大了运筹学和计算的范围，寻求有关理论、方法、实验、系统和应用方面的原创研究论文、新颖的调查和教程论文，以及描述新的和有用的软件工具的论文。官网链接：https://pubsonline.informs.org/journal/ijoc

Distributional regression is extended to Gaussian response vectors of dimension greater than two by parameterizing the covariance matrix $\Sigma$ of the response distribution using the entries of its Cholesky decomposition. The more common variance-correlation parameterization limits such regressions to bivariate responses -- higher dimensions require complicated constraints among the correlations to ensure positive definite $\Sigma$ and a well-defined probability density function. In contrast, Cholesky-based parameterizations ensure positive definiteness for all distributional dimensions no matter what values the parameters take, enabling estimation and regularization as for other distributional regression models. In cases where components of the response vector are assumed to be conditionally independent beyond a certain lag $r$, model complexity can be further reduced by setting Cholesky parameters beyond this lag to zero a priori. Cholesky-based multivariate Gaussian regression is first illustrated and assessed on artificial data and subsequently applied to a real-world 10-dimensional weather forecasting problem. There the regression is used to obtain reliable joint probabilities of temperature across ten future times, leveraging temporal correlations over the prediction period to obtain more precise and meteorologically consistent probabilistic forecasts.

Consider the task of matrix estimation in which a dataset $X \in \mathbb{R}^{n\times m}$ is observed with sparsity $p$, and we would like to estimate $\mathbb{E}[X]$, where $\mathbb{E}[X_{ui}] = f(\alpha_u, \beta_i)$ for some Holder smooth function $f$. We consider the setting where the row covariates $\alpha$ are unobserved yet the column covariates $\beta$ are observed. We provide an algorithm and accompanying analysis which shows that our algorithm improves upon naively estimating each row separately when the number of rows is not too small. Furthermore when the matrix is moderately proportioned, our algorithm achieves the minimax optimal nonparametric rate of an oracle algorithm that knows the row covariates. In simulated experiments we show our algorithm outperforms other baselines in low data regimes.

Modern data collecting methods and computation tools have made it possible to monitor high-dimensional processes. In this article, Phase II monitoring of high-dimensional processes is investigated when the available number of samples collected in Phase I is limitted in comparison to the number of variables. A new charting statistic for high-dimensional multivariate processes based on the diagonal elements of the underlying covariance matrix is introduced and a unified procedure for Phase I and II by employing a self-starting control chart is proposed. To remedy the effect of outliers, we adopt a robust procedure for parameter estimation in Phase I and introduce the appropriate consistent estimators. The statistical performance of the proposed method is evaluated in Phase II through average run length (ARL) criterion in the absence and presence of outliers and reveals that the proposed control chart scheme effectively detects various kinds of shifts in the process mean. Finally, we illustrate the applicability of our proposed method via a real-world example.

This paper aims at providing a new semi-parametric estimator for LARCH($\infty$) processes, and therefore also for LARCH(p) or GLARCH(p, q) processes. This estimator is obtained from the minimization of a contrast leading to a least squares estimator of the absolute values of the process. The strong consistency and the asymptotic normality are showed, and the convergence happens with rate $\sqrt$ n as well in cases of short or long memory. Numerical experiments confirm the theoretical results, and show that this new estimator clearly outperforms the smoothed quasi-maximum likelihood estimators or the weighted least square estimators often used for such processes.

In this paper we address the computational feasibility of the class of decision theoretic models referred to as adversarial risk analyses (ARA). These are models where a decision must be made with consideration for how an intelligent adversary may behave and where the decision-making process of the adversary is unknown, and is elicited by analyzing the adversary's decision problem using priors on his utility function and beliefs. The motivation of this research was to develop a computational algorithm that can be applied across a broad range of ARA models; to the best of our knowledge, no such algorithm currently exists. Using a two-person sequential model, we incrementally increase the size of the model and develop a simulation-based approximation of the true optimum where an exact solution is computationally impractical. In particular, we begin with a relatively large decision space by considering a theoretically continuous space that must be discretized. Then, we incrementally increase the number of strategic objectives which causes the decision space to grow exponentially. The problem is exacerbated by the presence of an intelligent adversary who also must solve an exponentially large decision problem according to some unknown decision-making process. Nevertheless, using a stylized example that can be solved analytically we show that our algorithm not only solves large ARA models quickly but also accurately selects to the true optimal solution. Furthermore, the algorithm is sufficiently general that it can be applied to any ARA model with a large, yet finite, decision space.

For the class of Gauss-Markov processes we study the problem of asymptotic equivalence of the nonparametric regression model with errors given by the increments of the process and the continuous time model, where a whole path of a sum of a deterministic signal and the Gauss-Markov process can be observed. In particular we provide sufficient conditions such that asymptotic equivalence of the two models holds for functions from a given class, and we verify these for the special cases of Sobolev ellipsoids and H\"older classes with smoothness index $> 1/2$ under mild assumptions on the Gauss-Markov process at hand. To derive these results, we develop an explicit characterization of the reproducing kernel Hilbert space associated with the Gauss-Markov process, that hinges on a characterization of such processes by a property of the corresponding covariance kernel introduced by Doob. In order to demonstrate that the given assumptions on the Gauss-Markov process are in some sense sharp we also show that asymptotic equivalence fails to hold for the special case of Brownian bridge. Our findings demonstrate that the well-known asymptotic equivalence of the Gaussian white noise model and the nonparametric regression model with i.i.d. standard normal errors can be extended to a result treating general Gauss-Markov noises in a unified manner.

Photoactivated localization microscopy (PALM) is a powerful imaging technique for characterization of protein organization in biological cells. Due to the stochastic blinking of fluorescent probes, and camera discretization effects, each protein gives rise to a cluster of artificial observations. These blinking artifacts are an obstacle for quantitative analysis of PALM data, and tools for their correction are in high demand. We develop the Independent Blinking Cluster point process (IBCpp) family of models, which is suited for modeling of data from single-molecule localization microscopy modalities, and we present results on the mark correlation function. We then construct the PALM-IBCpp - a semiparametric IBCpp tailored for PALM data, and we describe a procedure for estimation of parameters, which can be used without parametric assumptions on the spatial organization of proteins. Our model is validated on nuclear pore complex reference data, where the ground truth was accurately recovered, and we demonstrate how the estimated blinking parameters can be used to perform a blinking corrected test for protein clustering in a cell expressing the adaptor protein LAT. Finally, we consider simulations with varying degrees of blinking and protein clustering to shed light on the expected performance in a range of realistic settings.

Compared to the nominal scale, the ordinal scale for a categorical outcome variable has the property of making a monotonicity assumption for the covariate effects meaningful. This assumption is encoded in the commonly used proportional odds model, but there it is combined with other parametric assumptions such as linearity and additivity. Herein, the considered models are non-parametric and the only condition imposed is that the effects of the covariates on the outcome categories are stochastically monotone according to the ordinal scale. We are not aware of the existence of other comparable multivariable models that would be suitable for inference purposes. We generalize our previously proposed Bayesian monotonic multivariable regression model to ordinal outcomes, and propose an estimation procedure based on reversible jump Markov chain Monte Carlo. The model is based on a marked point process construction, which allows it to approximate arbitrary monotonic regression function shapes, and has a built-in covariate selection property. We study the performance of the proposed approach through extensive simulation studies, and demonstrate its practical application in two real data examples.

Performing causal inference in observational studies requires we assume confounding variables are correctly adjusted for. G-computation methods are often used in these scenarios, with several recent proposals using Bayesian versions of g-computation. In settings with few confounders, standard models can be employed, however as the number of confounders increase these models become less feasible as there are fewer observations available for each unique combination of confounding variables. In this paper we propose a new model for estimating treatment effects in observational studies that incorporates both parametric and nonparametric outcome models. By conceptually splitting the data, we can combine these models while maintaining a conjugate framework, allowing us to avoid the use of MCMC methods. Approximations using the central limit theorem and random sampling allows our method to be scaled to high dimensional confounders while maintaining computational efficiency. We illustrate the model using carefully constructed simulation studies, as well as compare the computational costs to other benchmark models.

Discrete random structures are important tools in Bayesian nonparametrics and the resulting models have proven effective in density estimation, clustering, topic modeling and prediction, among others. In this paper, we consider nested processes and study the dependence structures they induce. Dependence ranges between homogeneity, corresponding to full exchangeability, and maximum heterogeneity, corresponding to (unconditional) independence across samples. The popular nested Dirichlet process is shown to degenerate to the fully exchangeable case when there are ties across samples at the observed or latent level. To overcome this drawback, inherent to nesting general discrete random measures, we introduce a novel class of latent nested processes. These are obtained by adding common and group-specific completely random measures and, then, normalising to yield dependent random probability measures. We provide results on the partition distributions induced by latent nested processes, and develop an Markov Chain Monte Carlo sampler for Bayesian inferences. A test for distributional homogeneity across groups is obtained as a by product. The results and their inferential implications are showcased on synthetic and real data.

Thomas Muschinski,Georg J. Mayr,Thorsten Simon,Nikolaus Umlauf,Achim Zeileis
0+阅读 · 10月27日
Christina Lee Yu
0+阅读 · 10月26日
0+阅读 · 10月26日
Jean-Marc Bardet
0+阅读 · 10月25日
0+阅读 · 10月25日
Louis G. Jensen,David J. Williamson,Ute Hahn
0+阅读 · 10月22日
Olli Saarela,Christian Rohrbeck,Elja Arjas
0+阅读 · 10月22日
Federico Camerlenghi,David B. Dunson,Antonio Lijoi,Igor Prünster,Abel Rodríguez
4+阅读 · 2018年1月15日

34+阅读 · 8月27日

61+阅读 · 2020年11月20日

40+阅读 · 2020年8月2日

65+阅读 · 2020年5月15日

144+阅读 · 2020年4月19日

25+阅读 · 2020年4月1日

CreateAMind
8+阅读 · 2019年5月18日

5+阅读 · 2019年3月22日

12+阅读 · 2019年3月1日
CreateAMind
6+阅读 · 2019年1月18日

10+阅读 · 2018年12月24日
CreateAMind
8+阅读 · 2018年12月10日
CreateAMind
3+阅读 · 2018年4月15日

24+阅读 · 2017年11月16日
CreateAMind
5+阅读 · 2017年8月4日
Top