We introduce HiPaR, a novel pattern-aided regression method for tabular data containing both categorical and numerical attributes. HiPaR mines hybrid rules of the form $p \Rightarrow y = f(X)$ where $p$ is the characterization of a data region and $f(X)$ is a linear regression model on a variable of interest $y$. HiPaR relies on pattern mining techniques to identify regions of the data where the target variable can be accurately explained via local linear models. The novelty of the method lies in the combination of an enumerative approach to explore the space of regions and efficient heuristics that guide the search. Such a strategy provides more flexibility when selecting a small set of jointly accurate and human-readable hybrid rules that explain the entire dataset. As our experiments shows, HiPaR mines fewer rules than existing pattern-based regression methods while still attaining state-of-the-art prediction performance.
翻译:我们引入了HipaR, 这是包含绝对属性和数值属性的表格数据的一种新型模式辅助回归方法。 HipaR 地雷表型混合规则 $p $ \ Rightrow y = f(X)$, 美元是数据区域特征的特性, 美元(X) 美元是利息变量的线性回归模型。 HipaR 依靠模式采矿技术确定数据区域, 目标变量可以通过本地线性模型准确解释。 这种方法的新颖之处在于将一个数字性方法结合起来, 探索区域空间, 以及指导搜索的高效超常性规则。 这样的战略在选择一组能够解释整个数据集的、 联合准确和可人读的一套小型混合规则时提供了更大的灵活性。 正如我们的实验所显示, HipaR 开采的规则比现有基于模式的回归方法要少, 同时仍然达到最先进的预测性能。