In this paper, we revisit pattern mining and study the distribution underlying a binary dataset thanks to the closure structure which is based on passkeys, i.e., minimum generators in equivalence classes robust to noise. We introduce $\Delta$-closedness, a generalization of the closure operator, where $\Delta$ measures how a closed set differs from its upper neighbors in the partial order induced by closure. A $\Delta$-class of equivalence includes minimum and maximum elements and allows us to characterize the distribution underlying the data. Moreover, the set of $\Delta$-classes of equivalence can be partitioned into the so-called $\Delta$-closure structure. In particular, a $\Delta$-class of equivalence with a high level demonstrates correlations among many attributes, which are supported by more observations when $\Delta$ is large. In the experiments, we study the $\Delta$-closure structure of several real-world datasets and show that this structure is very stable for large $\Delta$ and does not substantially depend on the data sampling used for the analysis.
翻译:在本文中,我们重新审视模式开采,并研究由于基于密钥的封闭结构而形成的二元数据集的分布情况,即对噪声具有强度的等值类中最小发电机。我们引入了$\Delta$的封闭性,即封闭操作器的一般化,即$\Delta$衡量封闭的数据集如何与关闭导致的部分顺序中的上邻相异。一个 $\Delta$类等值包含最小和最大元素,并允许我们描述数据背后的分布结构。此外,一套$\Delta$类等值的数据集可以被分割到所谓的$\Delta$-闭合结构中。特别是,一个高水平的等值的$\Delta$类显示了许多属性之间的关联性,当美元为Delta$巨大时,更多的观察支持这些属性。在实验中,我们研究了数个真实世界数据集的$\Delta$-闭合结构,并表明这一结构对于大数额的Delta$非常稳定,并不在很大程度上取决于用于分析的数据抽样。