supertickets：通过共同的体系结构搜索和参数修剪从超级链中绘制任务不足的彩票门票

论文标题

supertickets：通过共同的体系结构搜索和参数修剪从超级链中绘制任务不足的彩票门票

SuperTickets: Drawing Task-Agnostic Lottery Tickets from Supernets via Jointly Architecture Searching and Parameter Pruning

论文作者

You, Haoran, Li, Baopu, Sun, Zhanyi, Ouyang, Xu, Lin, Yingyan Celine

论文摘要

神经架构搜索（NAS）在从给定的超网上寻找有效的深神经网络（DNN）方面取得了惊人的成功。同时，彩票票证假设表明，DNN包含可以从头开始训练的小子网，以达到可比或更高的精度。因此，目前是通过第一次搜索然后修剪的管道开发有效的DNN的常见做法。然而，这样做通常需要搜索训练培训过程，因此计算成本过高。在本文中，我们首次发现有效的DNN及其彩票子网（即彩票）可以通过超级网络直接识别为通过共同架构搜索和参数修剪的二合一培训方案。此外，我们制定了一种进步和统一的超级标识识别策略，该策略使子网络在超网训练期间的连通性更改，比传统的稀疏培训在超级网训练期间更改，实现更好的准确性和效率折衷。最后，我们评估了从一个任务中提取的这种确定的超级票是否可以很好地转移到其他任务，从而验证其同时处理多个任务的潜力。关于三个任务和四个基准数据集的大量实验和消融研究验证了我们所提出的超级款项可提高准确性和效率权衡，而不论是否进行重新培训，无论是否进行了典型的NAS和修剪管道。代码和预估计的模型可在https://github.com/rice-eic/supertickets上找到。

Neural architecture search (NAS) has demonstrated amazing success in searching for efficient deep neural networks (DNNs) from a given supernet. In parallel, the lottery ticket hypothesis has shown that DNNs contain small subnetworks that can be trained from scratch to achieve a comparable or higher accuracy than original DNNs. As such, it is currently a common practice to develop efficient DNNs via a pipeline of first search and then prune. Nevertheless, doing so often requires a search-train-prune-retrain process and thus prohibitive computational cost. In this paper, we discover for the first time that both efficient DNNs and their lottery subnetworks (i.e., lottery tickets) can be directly identified from a supernet, which we term as SuperTickets, via a two-in-one training scheme with jointly architecture searching and parameter pruning. Moreover, we develop a progressive and unified SuperTickets identification strategy that allows the connectivity of subnetworks to change during supernet training, achieving better accuracy and efficiency trade-offs than conventional sparse training. Finally, we evaluate whether such identified SuperTickets drawn from one task can transfer well to other tasks, validating their potential of handling multiple tasks simultaneously. Extensive experiments and ablation studies on three tasks and four benchmark datasets validate that our proposed SuperTickets achieve boosted accuracy and efficiency trade-offs than both typical NAS and pruning pipelines, regardless of having retraining or not. Codes and pretrained models are available at https://github.com/RICE-EIC/SuperTickets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题