Cassandra：从对抗扰动中检测到木马网络

论文标题

Cassandra：从对抗扰动中检测到木马网络

Cassandra: Detecting Trojaned Networks from Adversarial Perturbations

论文作者

Zhang, Xiaoyu, Mian, Ajmal, Gupta, Rohit, Rahnavard, Nazanin, Shah, Mubarak

论文摘要

由于其高分类的准确性，深层神经网络正在广泛部署许多关键任务。在许多情况下，预培训的模型来自供应商，他们可能破坏了训练管道以将特洛伊木马行为插入模型。这些恶意行为可以以对手的意愿触发，因此，对深层模型的广泛部署构成了严重威胁。我们提出了一种验证预训练模型是木出还是良性的方法。我们的方法以从网络梯度中学到的对抗性扰动的形式捕获神经网络的指纹。将后门插入网络中，改变了其决策边界，这些边界有效地编码了其对抗性扰动。我们从其全球（$ l_ \ infty $和$ l_2 $有限）的扰动和每个扰动中的高能量局部区域训练两个流网络，以从其全球检测（$ l_ \ infty $和$ l_2 $界面）进行训练。前者编码网络的决策边界，后者编码未知的触发形状。我们还提出了一种异常检测方法，以识别Trojaned网络中的目标类别。我们的方法对于触发类型，触发尺寸，训练数据和网络体系结构是不变的。我们评估了有关MNIST，NIST-ROUND0和NIST-ROUND1数据集的方法，最多1,000个预训练的模型使该模型成为迄今为止在Trojaned网络检测的最大研究，并实现了超过92 \％的检测准确性，以设置新的最先进的ART。

Deep neural networks are being widely deployed for many critical tasks due to their high classification accuracy. In many cases, pre-trained models are sourced from vendors who may have disrupted the training pipeline to insert Trojan behaviors into the models. These malicious behaviors can be triggered at the adversary's will and hence, cause a serious threat to the widespread deployment of deep models. We propose a method to verify if a pre-trained model is Trojaned or benign. Our method captures fingerprints of neural networks in the form of adversarial perturbations learned from the network gradients. Inserting backdoors into a network alters its decision boundaries which are effectively encoded in their adversarial perturbations. We train a two stream network for Trojan detection from its global ($L_\infty$ and $L_2$ bounded) perturbations and the localized region of high energy within each perturbation. The former encodes decision boundaries of the network and latter encodes the unknown trigger shape. We also propose an anomaly detection method to identify the target class in a Trojaned network. Our methods are invariant to the trigger type, trigger size, training data and network architecture. We evaluate our methods on MNIST, NIST-Round0 and NIST-Round1 datasets, with up to 1,000 pre-trained models making this the largest study to date on Trojaned network detection, and achieve over 92\% detection accuracy to set the new state-of-the-art.

下载PDF全文

下载文献需遵守相关版权规定

论文标题