Learning from data has led to substantial advances in a multitude of disciplines, including text and multimedia search, speech recognition, and autonomous-vehicle navigation. Can machine learning enable similar leaps in the natural and social sciences? This is certainly the expectation in many scientific fields and recent years have seen a plethora of applications of non-linear models to a wide range of datasets. However, flexible non-linear solutions will not always improve upon manually adding transforms and interactions between variables to linear regression models. We discuss how to recognize this before constructing a data-driven model and how such analysis can help us move to intrinsically interpretable regression models. Furthermore, for a variety of applications in the natural and social sciences we demonstrate why improvements may be seen with more complex regression models and why they may not.
翻译:从数据中学习已导致许多学科取得重大进步,包括文字和多媒体搜索、语音识别和自动车辆导航。机器学习能够使自然科学和社会科学出现类似的飞跃吗?这当然是许多科学领域的期望,近年来,许多非线性模型都应用到广泛的数据集中。然而,在将变量之间的变异和相互作用手工添加到线性回归模型中,灵活的非线性解决方案并不总是会得到改善。我们讨论了在建立数据驱动模型之前如何认识到这一点,以及这种分析如何帮助我们转向内在的可解释回归模型。此外,对于自然和社会科学中的各种应用,我们展示了为什么可以用更复杂的回归模型来看待改进,以及为什么它们可能不会看到改进。