The Bayesian information criterion (BIC), defined as the observed data log likelihood minus a penalty term based on the sample size $N$, is a popular model selection criterion for factor analysis with complete data. This definition has also been suggested for incomplete data. However, the penalty term based on the `complete' sample size $N$ is the same no matter whether in a complete or incomplete data case. For incomplete data, there are often only $N_i<N$ observations for variable $i$, which means that using the `complete' sample size $N$ implausibly ignores the amounts of missing information inherent in incomplete data. Given this observation, a novel criterion called hierarchical BIC (HBIC) for factor analysis with incomplete data is proposed. The novelty is that it only uses the actual amounts of observed information, namely $N_i$'s, in the penalty term. Theoretically, it is shown that HBIC is a large sample approximation of variational Bayesian (VB) lower bound, and BIC is a further approximation of HBIC, which means that HBIC shares the theoretical consistency of BIC. Experiments on synthetic and real data sets are conducted to access the finite sample performance of HBIC, BIC, and related criteria with various missing rates. The results show that HBIC and BIC perform similarly when the missing rate is small, but HBIC is more accurate when the missing rate is not small.
翻译:Bayesian信息标准(BIC)被定义为观测的数据日志概率减去基于抽样规模(美元)的罚款期限(BIC),是采用完整数据进行要素分析的流行模式选择标准,这一定义也建议了不完整数据,但基于“完整”抽样规模(美元)的处罚条件是相同的,无论在一个完整数据或不完整数据案例中,无论在完整数据案例中,“完整”抽样规模(美元)为美元,都是一样的。关于数据不完整数据,通常只有美元=美元观测值,这意味着使用“完全”抽样规模(美元)减去一个基于抽样规模(美元)的罚款期限,无法令人信服地忽略不完整数据所固有的信息数量。有鉴于这一观察,提出了一种称为有不完整数据进行要素分析的等级(HBIC)的新标准,称为“高级BIC(HBIC)”(HBIC) (HIC) (HIC) (HIC) (HIBI(HIC) (HIB) (C) (缩略图) 和(HBI(HI) (C) (HBI) (C) (C) (C) (C) (SLIBIB) (C) (C) (SLILI) (C) (SLIL) (C) (SL) (C) (C) (S) (S) (S) (S) (S) (S) (S) (C) (S) (S) ) ) ) (S) (的准确率) (实际使用率) (C) (C) (C) ) (C) (C) (C) (C) (C) ) ) ) ) ) ) (和(IBIBIBI) (C) (C) ( ) ) ) ((C) (C) (C) (C) (C) ) (C) ((C) ((C) (的) (C) (C) (C) (C) (C) (C) (C) (C) ) (的) (C) (C) (C) (C) (C) ) (C) ) (C) (C) ) ((C) ((C) (C) ((