熊猫：及时转移达到知识蒸馏以进行有效的模型适应

论文标题

熊猫：及时转移达到知识蒸馏以进行有效的模型适应

PANDA: Prompt Transfer Meets Knowledge Distillation for Efficient Model Adaptation

论文作者

Zhong, Qihuang, Ding, Liang, Liu, Juhua, Du, Bo, Tao, Dacheng

论文摘要

迅速转移（POT）是一种最近提出的方法，可以通过在类似的源任务中培训的现有提示来初始化目标提示，以改善及时调整。但是，这种香草锅的方法通常会实现次优性能，因为（i）锅对源目标对的相似性很敏感，并且（ii）直接对目标提示进行初始化的及时介绍可能会导致目标任务的初始提示，这可能会导致忘记从源任务中学到的有用的经常知识。为了解决这些问题，我们提出了一个新的指标，以准确预测及时的转移性（关于（i）），以及一种利用知识蒸馏技术来有效减轻知识遗忘的新颖锅方法（即熊猫）（关于（ii））。对PLM的5个尺度上的21个源和9个目标数据集的189组组合进行了广泛的系统实验，表明：1）我们提出的指标很好地预测了及时的可传递性； 2）我们的熊猫在所有任务和型号的大小中始终优于香草锅的平均得分2.3％（最高24.1％）； 3）通过我们的熊猫方法，及时调整可以比在各种PLM量表场景中进行竞争性，甚至更好的性能。我们已在https://github.com/whu-zqh/panda中公开发布代码。

Prompt Transfer (PoT) is a recently-proposed approach to improve prompt-tuning, by initializing the target prompt with the existing prompt trained on similar source tasks. However, such a vanilla PoT approach usually achieves sub-optimal performance, as (i) the PoT is sensitive to the similarity of source-target pair and (ii) directly fine-tuning the prompt initialized with source prompt on target task might lead to forgetting of the useful general knowledge learned from source task. To tackle these issues, we propose a new metric to accurately predict the prompt transferability (regarding (i)), and a novel PoT approach (namely PANDA) that leverages the knowledge distillation technique to alleviate the knowledge forgetting effectively (regarding (ii)). Extensive and systematic experiments on 189 combinations of 21 source and 9 target datasets across 5 scales of PLMs demonstrate that: 1) our proposed metric works well to predict the prompt transferability; 2) our PANDA consistently outperforms the vanilla PoT approach by 2.3% average score (up to 24.1%) among all tasks and model sizes; 3) with our PANDA approach, prompt-tuning can achieve competitive and even better performance than model-tuning in various PLM scales scenarios. We have publicly released our code in https://github.com/WHU-ZQH/PANDA.

下载PDF全文

下载文献需遵守相关版权规定

论文标题