Frequent temporal patterns discovered in time-interval-based multivariate data, although syntactically correct, might be non-transparent: For some pattern instances, there might exist intervals for the same entity that contradict the pattern's usual meaning. We conjecture that non-transparent patterns are also less useful as classification or prediction features. We propose a new pruning constraint during a frequent temporal-pattern discovery process, the Semantic Adjacency Criterion [SAC], which exploits domain knowledge to filter out patterns that contain potentially semantically contradictory components. We have defined three SAC versions, and tested their effect in three medical domains. We embedded these criteria in a frequent-temporal-pattern discovery framework. Previously, we had informally presented the SAC principle and showed that using it to prune patterns enhances the repeatability of their discovery in the same clinical domain. Here, we define formally the semantics of three SAC variations, and compare the use of the set of pruned patterns to the use of the complete set of discovered patterns, as features for classification and prediction tasks in three different medical domains. We induced four classifiers for each task, using four machine-learning methods: Random Forests, Naive Bayes, SVM, and Logistic Regression. The features were frequent temporal patterns discovered in each data set. SAC-based temporal pattern-discovery reduced by up to 97% the number of discovered patterns and by up to 98% the discovery runtime. But the classification and prediction performance of the reduced SAC-based pattern-based features set, was as good as when using the complete set. Using SAC can significantly reduce the number of discovered frequent interval-based temporal patterns, and the corresponding computational effort, without losing classification or prediction performance.
翻译:在基于时间的跨周期多变数据中发现的常见时间规律,虽然在合成时的正确性,但可能不透明:在某些模式实例中,同一实体可能存在与模式通常含义相矛盾的间隔。我们推测,不透明模式作为分类或预测特征也不太有用。我们提出在经常的时间-周期发现过程中,在频繁的时间-周期性对称发现过程中,使用新的修剪限制,利用域知识过滤含有可能具有语义矛盾成分的模式。我们定义了三个SAC版本,并在三个医学领域测试了它们的效果。我们将这些标准嵌入了一个常时-周期性模式的发现框架。我们假设,不透明模式作为分类或预测特征,我们非正式地介绍了SAC原则,并表明使用它来稀释在同一个临床领域发现这些模式的重复性。在这里,我们正式定义了基于SAC的三种变异的精度的精度,并且将已调整的花期性模式的使用与完整模式的比较。我们确定了三个医学领域的分类和预测和预测特性。我们用三个常见的周期-周期-周期-周期-周期-周期-周期-周期-周期-S的变变变变变的S-S-S-S-ARC-C-S-S-S-S-S-S-S-S-S-S-S-S-S-C-S-S-S-S-S-S-S-S-S-I-S-S-S-S-S-S-I-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-I-I-I-I-I-I-I-I-I-I-I-I-I-S-I-I-I-S-S-S-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-