The histogram is a key method for visualizing data and estimating the underlying probability distribution. Incorrect conclusions about the data result from over or under-binning. A new method based on the Shannon entropy of the histogram uses a simple formula based on the differential entropy estimated from nearest-neighbour distances. Links are made between the new method and other algorithms such as Scott's formula, and cost and risk function methods. A parameter is found that predicts over and under-binning, which can be estimated for any histogram. The new algorithm is shown to be robust by application to real data.
翻译:直方图是可视化数据并估计潜在概率分布的关键方法。 有关数据来自超盘或低盘的不正确结论。 基于直方图的香农 entropy 的新方法使用基于近邻距离估计的差分英特罗比的简单公式。 在新方法与斯科特的公式等其他算法以及成本和风险函数方法之间建立联系。 发现一个参数可以预测超盘和低盘的参数, 可以对任何直方图进行估计。 新的算法通过应用真实数据来显示其坚固性 。