论文标题
空间:为持续学习腾出空间
SpaceNet: Make Free Space For Continual Learning
论文作者
论文摘要
持续学习(CL)范式旨在使神经网络能够以顺序的方式不断学习任务。该学习范式中的基本挑战是灾难性忘记了对新任务进行优化的模型时,尤其是当他们的数据无法访问时。当前基于建筑的方法旨在减轻灾难性遗忘问题,但要牺牲扩大模型的能力。基于正则化的方法保持固定模型容量;但是,先前的研究表明,当推断期间没有任务标识(例如,类增量学习方案)时,这些方法的性能降低了。在这项工作中,我们提出了一种基于架构的新方法,称为类增量学习方案的空间,我们利用该模型的可用固定容量智能。空间以自适应方式从划痕中列出稀疏的深神经网络,从而压缩紧凑数量的神经元中每个任务的稀疏连接。稀疏连接的自适应训练会导致稀疏表示,从而减少任务之间的干扰。实验结果表明,我们提出的方法对灾难性遗忘的旧任务以及空间效率在利用模型的可用容量方面的效率,留出了更多的任务。特别是,当对CL的众所周知的基准测试空间:分裂MNIST,Split Fashion-Mnist和CIFAR-10/100时,它通过较大的性能差距优于基于正则化的方法。此外,它比没有模型扩展的基于建筑的方法的性能更好,并通过基于彩排的方法获得了可比的结果,同时提供了巨大的记忆力减少。
The continual learning (CL) paradigm aims to enable neural networks to learn tasks continually in a sequential fashion. The fundamental challenge in this learning paradigm is catastrophic forgetting previously learned tasks when the model is optimized for a new task, especially when their data is not accessible. Current architectural-based methods aim at alleviating the catastrophic forgetting problem but at the expense of expanding the capacity of the model. Regularization-based methods maintain a fixed model capacity; however, previous studies showed the huge performance degradation of these methods when the task identity is not available during inference (e.g. class incremental learning scenario). In this work, we propose a novel architectural-based method referred as SpaceNet for class incremental learning scenario where we utilize the available fixed capacity of the model intelligently. SpaceNet trains sparse deep neural networks from scratch in an adaptive way that compresses the sparse connections of each task in a compact number of neurons. The adaptive training of the sparse connections results in sparse representations that reduce the interference between the tasks. Experimental results show the robustness of our proposed method against catastrophic forgetting old tasks and the efficiency of SpaceNet in utilizing the available capacity of the model, leaving space for more tasks to be learned. In particular, when SpaceNet is tested on the well-known benchmarks for CL: split MNIST, split Fashion-MNIST, and CIFAR-10/100, it outperforms regularization-based methods by a big performance gap. Moreover, it achieves better performance than architectural-based methods without model expansion and achieved comparable results with rehearsal-based methods, while offering a huge memory reduction.