机器学习应用的代码气味

论文标题

机器学习应用的代码气味

Code Smells for Machine Learning Applications

论文作者

Zhang, Haiyin, Cruz, Luís, van Deursen, Arie

论文摘要

近年来，机器学习的普及已大大扩展。机器学习技术已在学术界进行了广泛的研究，并在行业中应用以创造业务价值。但是，缺乏机器学习应用中代码质量的准则。特别是，在该域中很少研究代码气味。尽管机器学习代码通常被集成为总体系统的一小部分，但通常在其核心功能中起重要作用。因此，从长远来看，确保代码质量是典型的，以避免问题。本文提出并确定了从各种来源收集的22个机器学习特定代码气味的列表，包括论文，灰色文献，github consits和堆栈溢出帖子。我们以描述其上下文，从长远来看的潜在问题以及提出的解决方案来指出每种气味。此外，我们将它们与他们各自的管道阶段以及学术和灰色文学的证据联系起来。代码气味目录可帮助数据科学家和开发人员生产并维护高质量的机器学习应用程序代码。

The popularity of machine learning has wildly expanded in recent years. Machine learning techniques have been heatedly studied in academia and applied in the industry to create business value. However, there is a lack of guidelines for code quality in machine learning applications. In particular, code smells have rarely been studied in this domain. Although machine learning code is usually integrated as a small part of an overarching system, it usually plays an important role in its core functionality. Hence ensuring code quality is quintessential to avoid issues in the long run. This paper proposes and identifies a list of 22 machine learning-specific code smells collected from various sources, including papers, grey literature, GitHub commits, and Stack Overflow posts. We pinpoint each smell with a description of its context, potential issues in the long run, and proposed solutions. In addition, we link them to their respective pipeline stage and the evidence from both academic and grey literature. The code smell catalog helps data scientists and developers produce and maintain high-quality machine learning application code.

下载PDF全文

下载文献需遵守相关版权规定

论文标题