The estimation of information measures of continuous distributions based on samples is a fundamental problem in statistics and machine learning. In this paper, we analyze estimates of differential entropy in $K$-dimensional Euclidean space, computed from a finite number of samples, when the probability density function belongs to a predetermined convex family $\mathcal{P}$. First, estimating differential entropy to any accuracy is shown to be infeasible if the differential entropy of densities in $\mathcal{P}$ is unbounded, clearly showing the necessity of additional assumptions. Subsequently, we investigate sufficient conditions that enable confidence bounds for the estimation of differential entropy. In particular, we provide confidence bounds for simple histogram based estimation of differential entropy from a fixed number of samples, assuming that the probability density function is Lipschitz continuous with known Lipschitz constant and known, bounded support. Our focus is on differential entropy, but we provide examples that show that similar results hold for mutual information and relative entropy as well.
翻译:对基于样本连续分布的信息的估测是统计和机器学习的根本问题。 在本文中,我们分析从一定数量的样本中计算,当概率密度函数属于预定的 convex family $\mathcal{P} $时,对基于样本的连续分布的信息量进行估计。 首先,如果不限制以美元表示的密度的微小的微小,那么估算以任何准确性为基础的微小微小微小微的微小微微小微小微是行不通的。 我们分析以美元表示的数值表示的更多假设的必要性。 随后,我们调查足够的条件,以便能够为估计差异微小微小的微小微空间设定信任界限。 特别是,我们为基于固定数量的样本对差微小微粒估计的简单直方图提供了信任界限, 假设概率函数是Lipschitz连续的, 已知的Lipschitz 常态和已知的、 捆绑的支持。 我们的重点放在差异的微小微小微, 但是我们提供的例子显示, 类似的结果对相互的信息和相对的相似结果。