论文标题
机器学习中不可重复可重复的来源:评论
Sources of Irreproducibility in Machine Learning: A Review
论文作者
论文摘要
背景:许多已发表的机器学习研究是不可培养的。方法论问题,并且无法正确考虑算法本身引入的变化或其实现的变化,这是对不可恢复性的主要因素。没有这样的框架,从业者和研究人员很难评估实验结果并描述实验的局限性。缺乏这样的框架也使独立研究人员更难系统地归因于失败的可重复性实验的原因。目的:本文的目的是开发一个框架,使应用数据科学从业人员和研究人员能够了解哪种实验设计选择可以导致错误的发现以及如何以及通过此帮助分析可重复性实验的结论。方法:我们已经汇总了文献中报告的广泛因素,这些因素可能导致机器学习研究不可重复。这些因素是由科学方法阶段促进的可重复性框架中组织和分类的。分析这些因素如何影响实验得出的结论。模型比较研究被用作例子。结论:我们提供了一个框架,该框架描述了机器学习方法从实验设计决策到从中推断出的结论。
Background: Many published machine learning studies are irreproducible. Issues with methodology and not properly accounting for variation introduced by the algorithm themselves or their implementations are attributed as the main contributors to the irreproducibility.Problem: There exist no theoretical framework that relates experiment design choices to potential effects on the conclusions. Without such a framework, it is much harder for practitioners and researchers to evaluate experiment results and describe the limitations of experiments. The lack of such a framework also makes it harder for independent researchers to systematically attribute the causes of failed reproducibility experiments. Objective: The objective of this paper is to develop a framework that enable applied data science practitioners and researchers to understand which experiment design choices can lead to false findings and how and by this help in analyzing the conclusions of reproducibility experiments. Method: We have compiled an extensive list of factors reported in the literature that can lead to machine learning studies being irreproducible. These factors are organized and categorized in a reproducibility framework motivated by the stages of the scientific method. The factors are analyzed for how they can affect the conclusions drawn from experiments. A model comparison study is used as an example. Conclusion: We provide a framework that describes machine learning methodology from experimental design decisions to the conclusions inferred from them.