在多变量计数数据中处理多变量计数数据的超分散 (Dealing with overdispersion in multivariate count data)

The problem of overdispersion in multivariate count data is a challenging issue. Nowadays, it covers a central role mainly due to the relevance of modern technologies data, such as Next Generation Sequencing and textual data from the web or digital collections. This work presents a comprehensive analysis of the likelihood-based models for extra-variation data proposed in the scientific literature. Particular attention will be paid to the models feasible for high-dimensional data. A new approach together with its parametric-estimation procedure is proposed. It is a deeper version of the Dirichlet-Multinomial distribution and it leads to important results allowing to get a better approximation of the observed variability. A significative comparison of these models is made through two different simulation studies that both confirm that the new model considered in this work allows to achieve the best results.

翻译：多变量计数数据过于分散的问题是一个具有挑战性的问题。如今,它涵盖了主要由于现代技术数据的相关性而起的中心作用,例如下一代测序和来自网络或数字收集的文字数据。这项工作对科学文献中提议的基于可能性的外变量数据模型进行了全面分析。将特别注意对高维数据可行的模型。提出了一种新的方法及其参数估计程序。它是Drichlet-Multinomial分布的更深层次版本,并导致重要的结果,使得观测到的变异性得到更好的近似。通过两个不同的模拟研究对这些模型进行了象征性的比较,这两个研究都证实这项工作中考虑的新模型能够取得最佳结果。

相关内容

MoDELS

关注 44

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

Python分布式计算，171页pdf，Distributed Computing with Python

专知会员服务

108+阅读 · 2020年5月3日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日