Recent work has shown that finite mixture models with $m$ components are identifiable, while making no assumptions on the mixture components, so long as one has access to groups of samples of size $2m-1$ which are known to come from the same mixture component. In this work we generalize that result and show that, if every subset of $k$ mixture components of a mixture model are linearly independent, then that mixture model is identifiable with only $(2m-1)/(k-1)$ samples per group. We further show that this value cannot be improved. We prove an analogous result for a stronger form of identifiability known as "determinedness" along with a corresponding lower bound. This independence assumption almost surely holds if mixture components are chosen randomly from a $k$-dimensional space. We describe some implications of our results for multinomial mixture models and topic modeling.
翻译:最近的工作表明,具有百万美元成分的限定混合物模型是可识别的,而对于混合物成分则不作任何假设,只要人们能够接触已知来自同一混合物成分的200万至1美元的样品组群,即200万至1美元;在这项工作中,我们概括了这一结果,并表明,如果混合物模型中每组成分的每组美元混合物成分子子是线性独立的,那么该混合物模型只能用每组(200万-1美元)/(k-1美元)的样品来识别。我们进一步表明,这一数值无法改进。我们证明,一个类似的结果是,一种被称为“确定性”的较强的可识别性形式,同时具有相应的较低约束。如果混合物成分是随机从1美元维空间选择的,则这一独立假设几乎肯定有效。我们描述了我们的结果对多数值混合物模型和专题模型的某些影响。