We show a statistical version of Taylor's theorem and apply this result to non-parametric density estimation from truncated samples, which is a classical challenge in Statistics \cite{woodroofe1985estimating, stute1993almost}. The single-dimensional version of our theorem has the following implication: "For any distribution $P$ on $[0, 1]$ with a smooth log-density function, given samples from the conditional distribution of $P$ on $[a, a + \varepsilon] \subset [0, 1]$, we can efficiently identify an approximation to $P$ over the \emph{whole} interval $[0, 1]$, with quality of approximation that improves with the smoothness of $P$." To the best of knowledge, our result is the first in the area of non-parametric density estimation from truncated samples, which works under the hard truncation model, where the samples outside some survival set $S$ are never observed, and applies to multiple dimensions. In contrast, previous works assume single dimensional data where each sample has a different survival set $S$ so that samples from the whole support will ultimately be collected.
翻译:我们展示了泰勒理论的统计版,并将这一结果应用于从缺漏的样本中得出的非参数密度估计,这是统计中的典型挑战。我们理论的单维版本具有以下含义:“对于以$$[0,1]美元为单位的分布,并具有平稳的逻辑密度功能,从以$$[a,+\varepsilon]\ subset [0,1]美元为单位有条件分配的美元($a,+\varepsilon]\ subset [0,1]美元)的样本中,我们可以有效地确定一个近似值到$P$(美元/emph{ll}间隔$[10,1]美元)的美元,其近似质量随着美元的顺畅度而得到改善。” 据最佳了解,我们的结果是第一个在非参数密度估计领域,根据硬调速率模型,在其中从未观察到某些生存指标以外的样品$S美元,并应用于多个维度。相比之下,以前的工程假设每个样品的单维度数据最终将具有不同的生存标准。