论文标题
为科学构建有效的机器学习模型:多学科的观点
Constructing Effective Machine Learning Models for the Sciences: A Multidisciplinary Perspective
论文作者
论文摘要
从数据中学习导致了许多学科的重大进步,包括文本和多媒体搜索,语音识别和自动驾驶汽车导航。机器学习能否在自然和社会科学中产生类似的飞跃?当然,这是许多科学领域的期望,并且近年来已经看到了许多非线性模型在广泛的数据集中的应用。但是,在手动添加变量和变量之间的相互作用到线性回归模型之间,灵活的非线性解决方案并不总是会改善。我们在构建数据驱动的模型以及此类分析如何帮助我们转向本质上可解释的回归模型之前讨论如何识别这一点。此外,对于自然和社会科学中的各种应用,我们证明了为什么可以通过更复杂的回归模型以及为什么不进行改进。
Learning from data has led to substantial advances in a multitude of disciplines, including text and multimedia search, speech recognition, and autonomous-vehicle navigation. Can machine learning enable similar leaps in the natural and social sciences? This is certainly the expectation in many scientific fields and recent years have seen a plethora of applications of non-linear models to a wide range of datasets. However, flexible non-linear solutions will not always improve upon manually adding transforms and interactions between variables to linear regression models. We discuss how to recognize this before constructing a data-driven model and how such analysis can help us move to intrinsically interpretable regression models. Furthermore, for a variety of applications in the natural and social sciences we demonstrate why improvements may be seen with more complex regression models and why they may not.