了解和改善通过神经崩溃对深层模型的转移学习

论文标题

了解和改善通过神经崩溃对深层模型的转移学习

Understanding and Improving Transfer Learning of Deep Models via Neural Collapse

论文作者

Li, Xiao, Liu, Sheng, Zhou, Jinxin, Lu, Xinyu, Fernandez-Granda, Carlos, Zhu, Zhihui, Qu, Qing

论文摘要

随着大型预训练模型的不断增长的复杂性以及用于下游培训的标记数据短缺，转移学习已成为许多领域的主要方法，包括自然语言处理，计算机视觉和多模式学习。尽管最近取得了进展，但视觉中大规模预训练模型的微调过程仍主要依赖于反复试验。这项工作调查了神经崩溃（NC）与分类问题的转移学习之间的关系。 NC是一种有趣的，而普遍的现象，最近是根据训练有素的神经网络的最终层特征和线性分类器而发现的。具体而言，在训练的终端阶段，NC意味着每个类中特征的变异性会减少到零，而类之间的特征平均值是最大和同样距离的。在这项工作中，我们检查了在下游和源数据上进行转移学习的预训练模型的NC属性，并发现特征崩溃与下游性能之间的牢固相关性。特别是，我们发现了一种系统的模式，该模式在下游训练数据上进行线性探测预训练的模型时会出现：在下游训练数据上，预训练模型的特征崩溃越多，传递精度就越高。此外，我们还研究了NC与源数据的转移精度之间的关系。此外，这些发现使我们能够开发一种原则性的，有效的微调方法，该方法采用跳过连接来诱导下游数据上的最后一层特征崩溃。我们提出的微调方法提供了良好的性能，同时将微调参数降低至少90％，并在情况下减轻过度拟合，尤其是在稀缺下游数据时。

With the ever-increasing complexity of large-scale pre-trained models coupled with a shortage of labeled data for downstream training, transfer learning has become the primary approach in many fields, including natural language processing, computer vision, and multi-modal learning. Despite recent progress, the fine-tuning process for large-scale pre-trained models in vision still mostly relies on trial and error. This work investigates the relationship between neural collapse (NC) and transfer learning for classification problems. NC is an intriguing while prevalent phenomenon that has been recently discovered in terms of the final-layer features and linear classifiers of trained neural networks. Specifically, during the terminal phase of training, NC implies that the variability of the features within each class diminishes to zero, while the means of features between classes are maximally and equally distanced. In this work, we examine the NC attributes of pre-trained models on both downstream and source data for transfer learning, and we find strong correlation between feature collapse and downstream performance. In particular, we discovered a systematic pattern that emerges when linear probing pre-trained models on downstream training data: the more feature collapse of pre-trained models on downstream training data, the higher the transfer accuracy. Additionally, we also studied the relationship between NC and transfer accuracy on the source data. Moreover, these findings allow us to develop a principled, parameter-efficient fine-tuning method that employs skip-connection to induce the last-layer feature collapse on downstream data. Our proposed fine-tuning methods deliver good performances while reducing fine-tuning parameters by at least 90% and mitigating overfitting in situations especially when the downstream data is scarce.

下载PDF全文

下载文献需遵守相关版权规定

论文标题