We introduce the Generalized Rescaled Polya (GRP) urn, that provides a generative model for a chi-squared test of goodness of fit for the long-term probabilities of clustered data, with independence between clusters and correlation, due to a reinforcement mechanism, inside each cluster. We apply the proposed test to a data set of Twitter posts about COVID-19 pandemic: in a few words, for a classical chi-squared test the data result strongly significant for the rejection of the null hypothesis (the daily long-run sentiment rate remains constant), but, taking into account the correlation among data, the introduced test leads to a different conclusion. Beside the statistical application, we point out that the GRP urn is a simple variant of the standard Eggenberger-Polya urn, that, with suitable choices of the parameters, shows "local" reinforcement, almost sure convergence of the empirical mean to a deterministic limit and different asymptotic behaviours of the predictive mean. Moreover, the study of this model provides the opportunity to analyze stochastic approximation dynamics, that are unusual in the related literature.
翻译:我们引入了通用的重缩缩略语 Polica (GRP) URn (GRP), 这是一种基因模型, 用来对组合数据的长期概率进行奇夸的测试, 由于每个组内有一个强化机制, 集群和关联性之间具有独立性。 我们对一组关于COVID-19 流行病的Twitter文章数据进行拟议测试: 用几个字来说, 对于典型的奇夸度测试来说, 数据结果对于拒绝无效假设来说意义重大( 每日长期情绪率保持不变 ), 但是, 考虑到数据之间的关联性, 引入的测试可以得出不同的结论。 在统计应用中, 我们指出, GRP urn 是标准的 Eggenberger- Polya urn 的简单变体, 通过对参数的适当选择, 显示“ 本地” 强化, 几乎可以肯定地将经验中的含义与确定性限度和预测值的不同随机性行为相融合。 此外,, 这一模型的研究为分析相关文献中不寻常的诊断性动态提供了机会 。