Nowadays, the confidentiality of data and information is of great importance for many companies and organizations. For this reason, they may prefer not to release exact data, but instead to grant researchers access to approximate data. For example, rather than providing the exact measurements of their clients, they may only provide researchers with grouped data, that is, the number of clients falling in each of a set of non-overlapping measurement intervals. The challenge is to estimate the mean and variance structure of the hidden ungrouped data based on the observed grouped data. To tackle this problem, this work considers the exact observed data likelihood and applies the Expectation-Maximization (EM) and Monte-Carlo EM (MCEM) algorithms for cases where the hidden data follow a univariate, bivariate, or multivariate normal distribution. Simulation studies are conducted to evaluate the performance of the proposed EM and MCEM algorithms. The well-known Galton data set is considered as an application example.
翻译:目前,数据和信息的保密性对于许多公司和组织都非常重要,因此,它们可能宁愿不公布准确的数据,而是让研究人员访问近似数据。例如,它们可能不提供其客户的确切测量数据,而只向研究人员提供分组数据,即每套非重叠测量间隔中下降的客户数量。挑战在于根据观察到的分组数据估算隐藏的未分组数据的平均值和差异结构。为了解决这一问题,这项工作考虑到精确观察到的数据可能性,并对隐藏数据经过单项、双轨或多变量正常分布的情况采用预期-最大化(EM)和蒙特-卡洛EM(MCEM)算法。进行了模拟研究,以评价拟议的EM和MCEM算法的性能。众所周知的Galton数据集被视为一个应用实例。