Detecting sets of relevant patterns from a given dataset is an important challenge in data mining. The relevance of a pattern, also called utility in the literature, is a subjective measure and can be actually assessed from very different points of view. Rule-based languages like Answer Set Programming (ASP) seem well suited for specifying user-provided criteria to assess pattern utility in a form of constraints; moreover, declarativity of ASP allows for a very easy switch between several criteria in order to analyze the dataset from different points of view. In this paper, we make steps toward extending the notion of High Utility Pattern Mining (HUPM); in particular we introduce a new framework that allows for new classes of utility criteria not considered in the previous literature. We also show how recent extensions of ASP with external functions can support a fast and effective encoding and testing of the new framework. To demonstrate the potential of the proposed framework, we exploit it as a building block for the definition of an innovative method for predicting ICU admission for COVID-19 patients. Finally, an extensive experimental activity demonstrates both from a quantitative and a qualitative point of view the effectiveness of the proposed approach. Under consideration in Theory and Practice of Logic Programming (TPLP)
翻译:在数据挖掘中,从给定的数据集中检测相关模式集是一个重要的挑战。模式的相关性,也称为效用,在文献中是一个主观的度量,并且可以从非常不同的角度实际评估。像答案集编程(ASP)这样的基于规则的语言似乎很适合通过约束条件规范用户提供的用于评估模式效用的标准; 此外,ASP的声明性允许非常容易地在不同的角度分析数据集,从而介绍了一种新的高效用模式挖掘(HUPM)的扩展方法; 我们还展示了如何使用最近的ASP外部函数扩展来支持新框架的快速有效的编码和测试。为了展示所提出的框架的潜力,我们将其作为预测COVID-19患者ICU入院的创新方法的构建块。最后,广泛的实验活动从定量和定性两个方面展示了所提出方法的有效性。该论文正在考虑中的《逻辑编程的理论和实践》(TPLP)。