与封闭结构有关的发现数据表层与封闭结构。 (Discovery data topology with the closure structure. Theoretical and practical aspects)

In this paper, we are revisiting pattern mining and especially itemset mining, which allows one to analyze binary datasets in searching for interesting and meaningful association rules and respective itemsets in an unsupervised way. While a summarization of a dataset based on a set of patterns does not provide a general and satisfying view over a dataset, we introduce a concise representation -- the closure structure -- based on closed itemsets and their minimum generators, for capturing the intrinsic content of a dataset. The closure structure allows one to understand the topology of the dataset in the whole and the inherent complexity of the data. We propose a formalization of the closure structure in terms of Formal Concept Analysis, which is well adapted to study this data topology. We present and demonstrate theoretical results, and as well, practical results using the GDPM algorithm. GDPM is rather unique in its functionality as it returns a characterization of the topology of a dataset in terms of complexity levels, highlighting the diversity and the distribution of the itemsets. Finally, a series of experiments shows how GDPM can be practically used and what can be expected from the output.

翻译：在本文中,我们正在重新审视模式采矿,特别是项目集采矿,这样,人们就可以以不受监督的方式分析在寻找有趣和有意义的关联规则和各个项目时的二元数据集。根据一组模式对数据集进行汇总并不提供对数据集的一般和令人满意的观点,但我们在封闭项目及其最小生成器的基础上采用简洁的表述方式 -- -- 封闭结构 -- -- 来捕捉数据集的内在内容。封闭结构使人们能够理解数据集整体的表层和数据固有的复杂性。我们建议正式化正式概念分析结构,该结构非常适合研究这一数据表层。我们介绍并展示理论结果,以及使用GDPM算法的实际结果。GDPM在功能上相当独特,因为它在复杂水平上可以反映数据集的表层特征,突出项目的多样性和分布。最后,一系列实验表明GDPM是如何实际使用的,并且可以从产出中预期到什么。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

最新《自监督表示学习》报告，70页ppt

专知会员服务

86+阅读 · 2020年12月22日

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

策略梯度方法的算子视图，An operator view of policy gradient methods

专知会员服务

11+阅读 · 2020年6月23日