This article is a case study illustrating the use of a multivariate statistical method for screening potential chemical markers for early detection of post-harvest disease in storage fruit. We simultaneously measure a range of volatile organic compounds (VOCs) and two measures of severity of disease infection in apples under storage: the number of apples presenting visible symptoms and the lesion area. We use multivariate generalised linear mixed models (MGLMM) for studying association patterns of those simultaneously observed responses via the covariance structure of random components. Remarkably, those MGLMMs can be used to represent patterns of association between quantities of different statistical nature. In the particular example considered in this paper, there are positive responses (concentrations of VOC, Gamma distribution based models), positive responses possibly containing observations with zero values (lesion area, Compound Poisson distribution based models) and binomially distributed responses (proportion of apples presenting infection symptoms). We represent patterns of association inferred with the MGLMMs using graphical models (a network represented by a graph), which allow us to eliminate spurious associations due to a cascade of indirect correlations between the responses.
翻译:本文是一份案例研究,说明使用多变量统计方法筛选潜在化学标记以早期发现收获后水果储存中的疾病。我们同时测量储存中的苹果中各种挥发性有机化合物(VOC)和两种疾病感染严重程度的量度:显示可见症状和损伤区域的苹果数量;我们使用多变量通用线性混合模型(MGLMM)研究通过随机成分的共变结构同时观察到的反应的联系模式;值得注意的是,这些MGLMM可用于代表不同统计性质数量之间的关联模式。在本文所考虑的具体例子中,有积极反应(VOC的浓度、基于Gamma分布模式)、可能含有零值观测结果的积极反应(隐蔽区、基于复合Poisson分布模式)和二流分布式反应(显示感染症状的苹果比例)。我们代表了使用图形模型(以图表为代表的网络)与MGLMMMMs的关联模式的推断模式,这使我们得以消除因答复之间间接相关连系而导致的刺激性联系。