Nowadays, the confidentiality of data and information is of great importance for many companies and organizations. For this reason, they may prefer not to release exact data, but instead to grant researchers access to approximate data. For example, rather than providing the exact income of their clients, they may only provide researchers with grouped data, that is, the number of clients falling in each of a set of non-overlapping income intervals. The challenge is to estimate the mean and variance structure of the hidden ungrouped data based on the observed grouped data. To tackle this problem, this work considers the exact observed data likelihood and applies the Expectation-Maximization (EM) and Monte-Carlo EM (MCEM) algorithms for cases where the hidden data follow a univariate, bivariate, or multivariate normal distribution. The results are then compared with the case of ignoring the grouping and applying regular maximum likelihood. The well-known Galton data and simulated datasets are used to evaluate the properties of the proposed EM and MCEM algorithms.
翻译:目前,数据和信息的保密性对许多公司和组织都非常重要,因此,它们可能宁愿不公布准确的数据,而是让研究人员访问近似数据。例如,它们可能不提供客户的确切收入,而只向研究人员提供分组数据,即每套非重叠收入间隔中每个下降的客户数目。挑战在于根据观察到的分组数据估算隐藏的未分组数据的平均值和差异结构。为了解决这一问题,这项工作考虑了准确观察到的数据可能性,并对隐藏数据经过单项、双轨或多变正常分布的情况采用了预期-最大化(EM)和蒙特-卡罗电子计算法。然后,将结果与忽略分组和定期适用最大可能性的情况进行比较。众所周知的Galton数据和模拟数据集被用来评估拟议的EM和MCEM算法的特性。