We set out the novel bottom up procedure to aggregate or cluster cells with small frequency counts together, in a two way classification while maintaining dependence in the table. The procedure is model free. It combines cells in a table into clusters based on independent log odds ratios. We use this procedure to build a set of statistically efficient and robust imputation cells, for the imputation of missing values of a continuous variable using a pair classification variables. A nice feature of the procedure is it forms aggregation groups homogeneous with respect to the cell response mean. Using a series of simulation studies, we show IlocA only groups together independent cells and does so in a consistent and credible way. While imputing missing data, we show IlocAs generates close to an optimal number of imputation cells. For ignorable non-response the resulting imputed means are accurate in general. With non-ignorable missingness results are consistent with those obtained elsewhere. We close with a case study applying our method to imputing missing building energy performance data
翻译:我们设置了新型的自下而上程序,将小频单元格或小频单元格集成在一起,以两种方式进行分类,在表格中保持依赖性。该程序是免费的。它将表格中的单元格合并成基于独立日志概率比的集群。我们使用这个程序来建立一套统计高效和稳健的估算单元格,以便使用对等分类变量来计算连续变量的缺失值。该程序的一个良好特征是,它构成与单元格响应平均值相同的组合组。我们通过一系列模拟研究,显示IlocA 仅将独立单元格组合在一起,并以一致和可信的方式这样做。在估算缺失数据的同时,我们显示 IlocA 生成接近于最佳的估算单元格数。对于由此产生的估算方法,如果不加忽略的不反应,一般是准确的。由于不可忽略的缺失结果与在别处获得的结果是一致的。我们接近于使用我们估算缺失的构建能源性能数据的方法进行案例研究。