Identifiability of discrete statistical models with latent variables is known to be challenging to study, yet crucial to a model's interpretability and reliability. This work presents a general algebraic technique to investigate identifiability of complicated discrete models with latent and graphical components. Specifically, motivated by diagnostic tests collecting multivariate categorical data, we focus on discrete models with multiple binary latent variables. In the considered model, the latent variables can have arbitrary dependencies among themselves while the latent-to-observed measurement graph takes a "star-forest" shape. We establish necessary and sufficient graphical criteria for identifiability, and reveal an interesting and perhaps surprising phenomenon of blessing-of-dependence geometry: under the minimal conditions for generic identifiability, the parameters are identifiable if and only if the latent variables are not statistically independent. Thanks to this theory, we can perform formal hypothesis tests of identifiability in the boundary case by testing certain marginal independence of the observed variables. Our results give new understanding of statistical properties of graphical models with latent variables. They also entail useful implications for designing diagnostic tests or surveys that measure binary latent traits.
翻译:已知,使用潜在变量的离散统计模型的可识别性具有挑战性,但对于模型的可解释性和可靠性至关重要。 这项工作展示了一种一般代数技术,以调查具有潜在和图形组成部分的复杂离散模型的可识别性。 具体地说,通过收集多变量绝对数据的诊断性测试,我们把重点放在具有多个二元潜在变量的离散模型上。 在所考虑的模型中,潜在变量彼此之间可能有任意的相互依存性,而潜可见的测量图则以“恒星-森林”为形状。 我们为可识别性制定了必要和充分的图形标准,并揭示了一种令人感兴趣的、或许令人惊讶的、可信赖的地理测量现象:在通用可识别性的最低条件下,只有当潜在变量在统计上不独立时,参数才能被识别。 借助这一理论,我们可以对所观测变量的某些边际独立性进行正式的可识别性假设性测试。 我们的结果为具有潜在变量的图形模型的统计特性提供了新的了解。 这些结果还给设计诊断性测试或调查以衡量的二元潜在潜在潜在潜在潜在潜在变量的诊断性带来有益的影响。