Background and Objective: Histograms and Pearson's coefficient of variation are among the most popular summary statistics. Researchers use them to judge the shape of quantitative data distribution by visual inspection of histograms. The coefficient of variation is taken as an estimator of relative variability of these data. We explore properties of histograms and coefficient of variation by examples in R, thus offering better alternatives: density plots and Eisenhauer's relative dispersion coefficient. Methods: Hypothetical examples developed in R are applied to create histograms and density and to compute coefficient of variation and relative dispersion coefficient. Results: These hypothetical examples clearly show that these two traditional approaches are flawed. Histograms are incapable of reflecting the distribution of probabilities and the coefficient of variation has issues with negative and positive values in the same dataset, it is sensible to outliers, and it is severely affected by mean value of a distribution. Potential replacements are explained and applied for contrast. Conclusions: With the use of modern computers and R language it is easy to replace histograms by density plots, which are able to approximate the theoretical probability distribution. In addition, Eisenhauer's relative dispersion coefficient is suggested as a suitable estimator of relative variability, including corrections for lower and upper bounds.
翻译:背景和目标:直方图和皮尔逊的变异系数是最受欢迎的简要统计之一。研究人员用这些参数来判断定量数据分布的形态,通过直观检查直方图。变异系数被作为这些数据相对变异性的估计值。我们用R中的例子来探讨直方图和变异系数的特性,从而提供更好的替代物:密度图和艾森豪尔相对分散系数。方法:R中开发的假冒例子被用于创建直方图和密度,并用于计算变异系数和相对分散系数。结果:这些假设例子清楚地表明这两种传统方法存在缺陷。直方图无法反映概率和变异系数的分布情况,而在同一数据集中,直方图和变异系数的数值与负正值有问题,对外方值是明智的,并且受到平均分布值的严重影响。潜在替换被解释并用于对比。结论:由于使用现代计算机和R语言很容易用密度图取代其变异系数和相对分散系数。结果:这些假设例子清楚地表明这两种传统方法都存在缺陷。直观方法无法反映概率分布的概率和变异性,包括相对的相对变异性系数的大小。