将零留出：迈向无跨验证方法进行模型选择

论文标题

将零留出：迈向无跨验证方法进行模型选择

Leave Zero Out: Towards a No-Cross-Validation Approach for Model Selection

论文作者

Li, Weikai, Geng, Chuanxing, Chen, Songcan

论文摘要

作为模型选择的主要主力，交叉验证（CV）由于其简单性和直觉而取得了经验成功。但是，尽管CV无处不在，但CV经常陷入以下臭名昭著的困境。一方面，对于小数据案例，CV遭受了保守的偏见估计，因为有限数据的某些部分必须保持验证。另一方面，对于大型数据案例，由于重复的训练程序，CV往往非常麻烦，例如不宽容的时间。自然，对简历的直接野心是验证计算成本要少得多的模型，同时充分利用整个给定的数据集进行培训。因此，本文从战略上得出了廉价且理论上保证的辅助/增强验证，而不是持有给定的数据。这种令人尴尬的简单策略只需要一次在整个给定的数据集上训练模型，从而使模型选择效率很高。此外，由于增强和样本外的学习过程估算，提出的验证方法适用于广泛的学习环境。最后，我们通过对多个数据集，模型和任务进行广泛评估来证明我们提出的方法的准确性和计算益处。

As the main workhorse for model selection, Cross Validation (CV) has achieved an empirical success due to its simplicity and intuitiveness. However, despite its ubiquitous role, CV often falls into the following notorious dilemmas. On the one hand, for small data cases, CV suffers a conservatively biased estimation, since some part of the limited data has to hold out for validation. On the other hand, for large data cases, CV tends to be extremely cumbersome, e.g., intolerant time-consuming, due to the repeated training procedures. Naturally, a straightforward ambition for CV is to validate the models with far less computational cost, while making full use of the entire given data-set for training. Thus, instead of holding out the given data, a cheap and theoretically guaranteed auxiliary/augmented validation is derived strategically in this paper. Such an embarrassingly simple strategy only needs to train models on the entire given data-set once, making the model-selection considerably efficient. In addition, the proposed validation approach is suitable for a wide range of learning settings due to the independence of both augmentation and out-of-sample estimation on learning process. In the end, we demonstrate the accuracy and computational benefits of our proposed method by extensive evaluation on multiple data-sets, models and tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题