论文标题
使用删除表示形式提高NLP模型的鲁棒性和一般性
Improving Robustness and Generality of NLP Models Using Disentangled Representations
论文作者
论文摘要
监督的神经网络首先将输入$ x $映射到单个表示$ z $,然后将$ z $映射到输出标签$ y $,在广泛的自然语言处理(NLP)任务中取得了巨大的成功。尽管它们成功,但神经模型都缺乏鲁棒性和一般性:对输入的小扰动可能会导致绝对不同的输出。在另一个域进行测试时,在一个域上训练的模型的性能会急剧下降。 在本文中,我们提出了从分离的表示学习的角度来提高NLP模型的鲁棒性和一般性的方法。提议的策略不是将$ x $映射到单个表示$ z $,而是将$ x $映射到一组表示$ \ {z_1,z_2,...,z_k \} $,而强迫它们被解开。然后将这些表示形式映射到不同的logits $ l $ s,其合奏用于制作最终的预测$ y $。我们提出了不同的方法,将此想法纳入当前使用的模型中,包括在$ z $ s上添加$ L $ 2的正规器或在变异信息瓶颈(VIB)框架下添加总相关性(TC)。我们表明,经过提议的标准训练的模型在广泛的监督学习任务中提供了更好的鲁棒性和域适应能力。
Supervised neural networks, which first map an input $x$ to a single representation $z$, and then map $z$ to the output label $y$, have achieved remarkable success in a wide range of natural language processing (NLP) tasks. Despite their success, neural models lack for both robustness and generality: small perturbations to inputs can result in absolutely different outputs; the performance of a model trained on one domain drops drastically when tested on another domain. In this paper, we present methods to improve robustness and generality of NLP models from the standpoint of disentangled representation learning. Instead of mapping $x$ to a single representation $z$, the proposed strategy maps $x$ to a set of representations $\{z_1,z_2,...,z_K\}$ while forcing them to be disentangled. These representations are then mapped to different logits $l$s, the ensemble of which is used to make the final prediction $y$. We propose different methods to incorporate this idea into currently widely-used models, including adding an $L$2 regularizer on $z$s or adding Total Correlation (TC) under the framework of variational information bottleneck (VIB). We show that models trained with the proposed criteria provide better robustness and domain adaptation ability in a wide range of supervised learning tasks.