We consider the problem of identifying stable sets of mutually associated features in moderate or high-dimensional binary data. In this context we develop and investigate a method called Latent Association Mining for Binary Data (LAMB). The LAMB method is based on a simple threshold model in which the observed binary values represent a random thresholding of a latent continuous vector that may have a complex association structure. We consider a measure of latent association that quantifies association in the latent continuous vector without bias due to the random thresholding. The LAMB method uses an iterative testing based search procedure to identify stable sets of mutually associated features. We compare the LAMB method with several competing methods on artificial binary-valued datasets and two real count-valued datasets. The LAMB method detects meaningful associations in these datasets. In the case of the count-valued datasets, associations detected by the LAMB method are based only on information about whether the counts are zero or non-zero, and is competitive with methods that have access to the full count data.
翻译:我们考虑在中度或高度二元数据中确定稳定的相联特性组的问题。我们在这方面制定和调查一种叫作 " 中度协会采矿二元数据 " (LAMB)的方法。LAMB方法基于一个简单的门槛模型,观察到的二进制值代表着潜在的连续矢量的随机阈值,而这种矢量可能具有复杂的关联结构。我们考虑一种潜在关联的量度,这种量度可以量化潜在连续矢量的关联,而不会因随机阈值差差差差差差差差差差差差差差差差差差差差差差差差差差差差。LAMB方法使用基于迭代测试的搜索程序来识别稳定的相联特性组。我们比较LAMB方法与关于人工二进制估值数据集和两个实际计数值数据集的若干相竞方法。LAMB方法检测了这些数据集中有意义的关联值。在计算值数据集中,LAMB方法所检测的关联仅基于关于计数是否为零或非零的信息,并且与能够获取完整计数数据的方法具有竞争力。