一次性模型适应异构客户：客户内和形象间的注意力设计

论文标题

一次性模型适应异构客户：客户内和形象间的注意力设计

One-Time Model Adaptation to Heterogeneous Clients: An Intra-Client and Inter-Image Attention Design

论文作者

Yan, Yikai, Niu, Chaoyue, Wu, Fan, Li, Qinya, Tang, Shaojie, Lyu, Chengfei, Chen, Guihai

论文摘要

图像识别应用程序的主流工作流程首先是在云上训练一个全球模型，用于各种类型的课程，然后为众多客户提供服务，每个客户都有来自一小部分类别的旨在识别的异质图像。从图像类别范围的云客户差异中，希望识别模型具有强大的适应性，通过将重点集中在每个客户的局部动态类子子集上，同时会产生可忽略的开销，从而具有直觉的适应性。在这项工作中，我们建议将新的网站内和图像间关注（ICIIA）模块插入现有的骨干识别模型中，只需要一次性基于云的培训才能是客户适应性的。特别是，鉴于来自某个客户的目标图像，ICIIA引入了多头自我注意力，以从客户的历史未标记图像中检索相关图像，从而校准焦点和识别结果。进一步考虑到ICIIA的开销是由线性投影主导的，我们提出了分区的线性投影，并通过功能改组进行替换，并允许增加分区的数量，以显着提高效率，而不会超过过多准确性。我们最终使用5个代表性数据集的9个骨干模型使用3个不同的识别任务评估ICIIA。广泛的评估结果证明了ICIIA的有效性和效率。具体而言，对于具有MobilenetV3-L和Swin-B的主链模型的ImagEnet-1K，ICIIA可以将测试准确性提高到83.37％（+8.11％）和88.86％（+5.28％），而分别仅增加1.62％和0.02％的FLOP。

The mainstream workflow of image recognition applications is first training one global model on the cloud for a wide range of classes and then serving numerous clients, each with heterogeneous images from a small subset of classes to be recognized. From the cloud-client discrepancies on the range of image classes, the recognition model is desired to have strong adaptiveness, intuitively by concentrating the focus on each individual client's local dynamic class subset, while incurring negligible overhead. In this work, we propose to plug a new intra-client and inter-image attention (ICIIA) module into existing backbone recognition models, requiring only one-time cloud-based training to be client-adaptive. In particular, given a target image from a certain client, ICIIA introduces multi-head self-attention to retrieve relevant images from the client's historical unlabeled images, thereby calibrating the focus and the recognition result. Further considering that ICIIA's overhead is dominated by linear projection, we propose partitioned linear projection with feature shuffling for replacement and allow increasing the number of partitions to dramatically improve efficiency without scarifying too much accuracy. We finally evaluate ICIIA using 3 different recognition tasks with 9 backbone models over 5 representative datasets. Extensive evaluation results demonstrate the effectiveness and efficiency of ICIIA. Specifically, for ImageNet-1K with the backbone models of MobileNetV3-L and Swin-B, ICIIA can improve the testing accuracy to 83.37% (+8.11%) and 88.86% (+5.28%), while adding only 1.62% and 0.02% of FLOPs, respectively.

下载PDF全文

下载文献需遵守相关版权规定

论文标题