Pattern mining is well established in data mining research, especially for mining binary datasets. Surprisingly, there is much less work about numerical pattern mining and this research area remains under-explored. In this paper, we propose Mint, an efficient MDL-based algorithm for mining numerical datasets. The MDL principle is a robust and reliable framework widely used in pattern mining, and as well in subgroup discovery. In Mint we reuse MDL for discovering useful patterns and returning a set of non-redundant overlapping patterns with well-defined boundaries and covering meaningful groups of objects. Mint is not alone in the category of numerical pattern miners based on MDL. In the experiments presented in the paper we show that Mint outperforms competitors among which Slim and RealKrimp.
翻译:在数据采矿研究中,特别是在采矿的二元数据集中,典型的采矿是早已确立的。令人惊讶的是,关于数字模式采矿的工作少得多,而且这一研究领域仍未得到充分探讨。在本文中,我们提议采用基于Mint的高效MDL采矿数字数据集算法Mint。MDL原则是一个强有力和可靠的框架,广泛用于模式采矿和分组发现。在Mint中,我们重新使用MDL,以发现有用的模式,并返回一组非冗余重叠模式,有明确界定的边界,覆盖有意义的物体群。在以MDL为基础的数字模式采矿者类别中,Mint并不是唯一一个。在本文中介绍的实验中,我们显示Mint优于Slim和RealKrimp等竞争者。