This is about the Minimum Description Length (MDL) principle applied to pattern mining. The length of this description is kept to the minimum. Mining patterns is a core task in data analysis and, beyond issues of efficient enumeration, the selection of patterns constitutes a major challenge. The MDL principle, a model selection method grounded in information theory, has been applied to pattern mining with the aim to obtain compact high-quality sets of patterns. After giving an outline of relevant concepts from information theory and coding, as well as of work on the theory behind the MDL and similar principles, we review MDL-based methods for mining various types of data and patterns. Finally, we open a discussion on some issues regarding these methods, and highlight currently active related data analysis problems.
翻译:这涉及适用于模式采矿的最低描述长度(MDL)原则,这一描述的长度保持在最低程度;采矿模式是数据分析的一项核心任务,除了高效查点问题之外,选择模式也构成一项重大挑战;MDL原则是建立在信息理论基础上的示范选择方法,已经应用于模式采矿,目的是获得一套精密的高质量模式;在概述了信息理论和编码的有关概念以及MDL理论和类似原则的起草工作之后,我们审查了基于MDL的各类数据和模式的开采方法;最后,我们开始讨论与这些方法有关的一些问题,并强调目前与数据分析有关的积极问题。