Modern machine learning methods are designed to exploit complex patterns in data regardless of their form, while not necessarily revealing them to the investigator. Here we demonstrate situations where modern machine learning methods are ill-equipped to reveal feature interaction effects and other nonlinear relationships. We propose the use of a conjecturing machine that generates feature relationships in the form of bounds for numerical features and boolean expressions for nominal features that are ignored by machine learning algorithms. The proposed framework is demonstrated for a classification problem with an interaction effect and a nonlinear regression problem. In both settings, true underlying relationships are revealed and generalization performance improves. The framework is then applied to patient-level data regarding COVID-19 outcomes to suggest possible risk factors.
翻译:现代机器学习方法旨在利用数据中的复杂模式,而不论其形式如何,但不一定向调查员透露这些模式。在这里,我们展示现代机器学习方法设备不足,无法揭示特异互动效应和其他非线性关系的情况。我们提议使用一种假设机器,以数字特征和布林表达法的界限的形式产生特征关系,这些特征被机器学习算法所忽视。拟议的框架被证明是一个分类问题,具有互动效应和非线性回归问题。在两种环境中,真实的基本关系都暴露出来,一般化表现得到改善。然后,该框架应用于患者一级关于COVID-19结果的数据,以提出可能的风险因素。