In this paper we are interested in studying concise representations of concepts and dependencies, i.e., implications and association rules. Such representations are based on equivalence classes and their elements, i.e., minimal generators, minimum generators including keys and passkeys, proper premises, and pseudo-intents. All these sets of attributes are significant and well studied from the computational point of view, while their statistical properties remain to be studied. This is the purpose of this paper to study these singular attribute sets and in parallel to study how to evaluate the complexity of a dataset from an FCA point of view. In the paper we analyze the empirical distributions and the sizes of these particular attribute sets. In addition we propose several measures of data complexity, such as distributivity, linearity, size of concepts, size of minimum generators, for the analysis of real-world and synthetic datasets.
翻译:在本文中,我们有兴趣研究概念和依赖性(即影响和关联规则)的简明表述,这种表述基于等同类别及其要素,即最低发电机、包括钥匙和钥匙在内的最低发电机、适当房地和假动机。所有这些特征组都很重要,从计算角度进行了深入研究,而其统计属性仍有待研究。这是本文件研究这些单一属性组的目的,同时研究如何从FCA角度评估数据集的复杂性。我们在文件中分析了这些特定属性组的经验分布和大小。此外,我们提出了数据复杂性的若干衡量标准,如分配性、线性、概念大小、最小生成器大小,用于分析真实世界和合成数据集。