Understanding the innovation process, that is the underlying mechanisms through which novelties emerge, diffuse and trigger further novelties is undoubtedly of fundamental importance in many areas (biology, linguistics, social science and others). The models introduced so far satisfy the Heaps' law, regarding the rate at which novelties appear, and the Zipf's law, that states a power law behavior for the frequency distribution of the elements. However, there are empirical cases far from showing a pure power law behavior and such a deviation is present for elements with high frequencies. We explain this phenomenon by means of a suitable "damping" effect in the probability of a repetition of an old element. While the proposed model is extremely general and may be also employed in other contexts, it has been tested on some Twitter data sets and demonstrated great performances with respect to Heaps' law and, above all, with respect to the fitting of the frequency-rank plots for low and high frequencies.
翻译:理解创新过程,这就是创新过程的产生、传播和触发更多新事物的基本机制,无疑在许多领域(生物学、语言学、社会科学和其他方面)具有根本重要性。迄今为止引入的模式满足了Heaps法,涉及新事物出现的速度和Zipf法,规定了关于元素频率分布的权力法行为。然而,经验案例远未显示纯粹的权力法行为,高频率元素也存在这种偏差。我们通过在重复旧元素的概率中产生适当的“跳跃”效应来解释这种现象。虽然拟议的模型非常笼统,也可能在其他情况下使用,但已经在一些Twitter数据集中进行了测试,并展示了与Heap法有关的巨大表现,最重要的是,在频率级地块适合低频率和高频率方面。