Citation distributions are lognormal. We use 30 lognormally distributed synthetic series of numbers that simulate real series of citations to investigate the consistency of the h index. Using the lognormal cumulative distribution function, the equation that defines the h index can be formulated; this equation shows that h has a complex dependence on the number of papers (N). We also investigate the correlation between h and the number of papers exceeding various citation thresholds, from 5 to 500 citations. The best correlation is for the 100 threshold but numerous data points deviate from the general trend. The size-independent indicator h/N shows no correlation with the probability of publishing a paper exceeding any of the citation thresholds. In contrast with the h index, the total number of citations shows a high correlation with the number of papers exceeding the thresholds of 10 and 50 citations; the mean number of citations correlates with the probability of publishing a paper that exceeds any level of citations. Thus, in synthetic series, the number of citations and the mean number of citations are much better indicators of research performance than h and h/N. We discuss that in real citation distributions there are other difficulties.
翻译:引用分布为逻辑正常。 我们使用30个随机分布的合成序列数来模拟真实引用序列来调查 h 指数的一致性。 使用对数累积分布函数, 用于定义 h 指数的方程式可以开发; 这个方程式表明 h 复杂地依赖纸张数量( N) 。 我们还调查了 h 与超过各种引用阈值的论文数量( 从 5 至 500 个 ) 之间的关系。 最佳的关联是 100 阈值, 但许多数据点与一般趋势不同。 大小独立指标 h/ N 显示, 与发表超过任何引用阈值的论文的概率没有关联性。 与 h 指数相反, 引用总数显示与超过 10 和 50 引用阈值的论文数量高度相关; 平均引用数量与发表超过任何引用阈值的论文的概率相关。 因此, 在合成系列中, 引用的次数和引用平均值是比 h/ h/ N 更好的研究绩效指标。 我们讨论在实际引用分布中存在其他困难。