论文标题

明确建模图像分类的注意图

Explicitly Modeled Attention Maps for Image Classification

论文作者

Tan, Andong, Nguyen, Duc Tam, Dax, Maximilian, Nießner, Matthias, Brox, Thomas

论文摘要

自我发挥的网络在计算机视觉任务(例如图像分类)中表现出了显着的进展。自我发挥机制的主要好处是能够在注意图中捕获远程特征相互作用。但是,注意映射的计算需要一个可学习的键,查询和位置编码,其用法通常不直观且计算昂贵。为了减轻此问题,我们提出了一个新型的自我发场模块,并使用仅使用单个可学习参数用于低计算开销的明确建模的注意图。使用几何学先验的显式建模注意图的设计是基于这样的观察,即图像中给定像素的空间上下文主要由其邻居主导,而更遥远的像素则具有较小的贡献。具体而言,注意图是通过可学习半径的简单函数(例如高斯内核)进行了参数化的,该功能与输入内容独立于建模。我们的评估表明,我们的方法在Imagenet ILSVRC中的准确性提高了2.2%的准确性高达2.2%,并且胜过其他自我注意力的方法,例如AA-Resnet152的准确性,准确度的准确性提高了0.9%,而少于6.4%的参数,少于6.7%。从经验上,该结果表明将几何事先提前纳入图像分类时,将几何事先提前纳入自我注意力的机制。

Self-attention networks have shown remarkable progress in computer vision tasks such as image classification. The main benefit of the self-attention mechanism is the ability to capture long-range feature interactions in attention-maps. However, the computation of attention-maps requires a learnable key, query, and positional encoding, whose usage is often not intuitive and computationally expensive. To mitigate this problem, we propose a novel self-attention module with explicitly modeled attention-maps using only a single learnable parameter for low computational overhead. The design of explicitly modeled attention-maps using geometric prior is based on the observation that the spatial context for a given pixel within an image is mostly dominated by its neighbors, while more distant pixels have a minor contribution. Concretely, the attention-maps are parametrized via simple functions (e.g., Gaussian kernel) with a learnable radius, which is modeled independently of the input content. Our evaluation shows that our method achieves an accuracy improvement of up to 2.2% over the ResNet-baselines in ImageNet ILSVRC and outperforms other self-attention methods such as AA-ResNet152 in accuracy by 0.9% with 6.4% fewer parameters and 6.7% fewer GFLOPs. This result empirically indicates the value of incorporating geometric prior into self-attention mechanism when applied in image classification.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源