与封闭结构有关的发现数据表层与封闭结构。 (Discovery data topology with the closure structure. Theoretical and practical aspects)

In this paper, we are revisiting pattern mining and especially itemset mining, which allows one to analyze binary datasets in searching for interesting and meaningful association rules and respective itemsets in an unsupervised way. While a summarization of a dataset based on a set of patterns does not provide a general and satisfying view over a dataset, we introduce a concise representation -- the closure structure -- based on closed itemsets and their minimum generators, for capturing the intrinsic content of a dataset. The closure structure allows one to understand the topology of the dataset in the whole and the inherent complexity of the data. We propose a formalization of the closure structure in terms of Formal Concept Analysis, which is well adapted to study this data topology. We present and demonstrate theoretical results, and as well, practical results using the GDPM algorithm. GDPM is rather unique in its functionality as it returns a characterization of the topology of a dataset in terms of complexity levels, highlighting the diversity and the distribution of the itemsets. Finally, a series of experiments shows how GDPM can be practically used and what can be expected from the output.

翻译：在本文中,我们正在重新审视模式采矿,特别是项目集采矿,这样,人们就可以以不受监督的方式分析在寻找有趣和有意义的关联规则和各个项目时的二元数据集。根据一组模式对数据集进行汇总并不提供对数据集的一般和令人满意的观点,但我们在封闭项目及其最小生成器的基础上采用简洁的表述方式 -- -- 封闭结构 -- -- 来捕捉数据集的内在内容。封闭结构使人们能够理解数据集整体的表层和数据固有的复杂性。我们建议正式化正式概念分析结构,该结构非常适合研究这一数据表层。我们介绍并展示理论结果,以及使用GDPM算法的实际结果。GDPM在功能上相当独特,因为它在复杂水平上可以反映数据集的表层特征,突出项目的多样性和分布。最后,一系列实验表明GDPM是如何实际使用的,并且可以从产出中预期到什么。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

54+阅读 · 2020年9月7日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

生成性对抗网络:理论模型、评估指标和最近发展的概述，Generative Adversarial Networks (GANs): An Overview of Theoretical Model, Evaluation Metrics, and Recent Developments

专知会员服务

42+阅读 · 2020年5月30日

【剑桥大学】统计因果关系的决策理论基础，Decision-theoretic foundations for statistical causality

专知会员服务

48+阅读 · 2020年5月5日