In this paper we are interested in studying concise representations of concepts and dependencies, i.e., implications and association rules. Such representations are based on equivalence classes and their elements, i.e., minimal generators, minimum generators including keys and passkeys, proper premises, and pseudo-intents. All these sets of attributes are significant and well studied from the computational point of view, while their statistical properties remain to be studied. This is the purpose of this paper to study these singular attribute sets and in parallel to study how to evaluate the complexity of a dataset from an FCA point of view. In the paper we analyze the empirical distributions and the sizes of these particular attribute sets. In addition we propose several measures of data complexity relying on these attribute sets in considering real-world and related randomized datasets.
翻译:在本文中,我们有兴趣研究概念和依赖性(即影响和关联规则)的简明表述,这种表述基于等同类别及其要素,即最低发电机、包括钥匙和钥匙在内的最低发电机、适当房地和假意图。所有这些属性组都很重要,从计算角度进行了深入研究,而其统计属性仍有待研究。这是本文件研究这些单一属性组的目的,同时研究如何从FCA角度评估数据集的复杂性。我们在文件中分析了这些特定属性组的经验分布和大小。此外,我们建议了一些衡量数据复杂性的措施,以这些属性组为依据,以考虑到现实世界和相关随机数据集。