论文标题
轴向扩展的窗口,用于视觉变压器中的本地全球相互作用
Axially Expanded Windows for Local-Global Interaction in Vision Transformers
论文作者
论文摘要
最近,变形金刚在各种视觉任务中表现出了有希望的表现。变压器设计中的一个具有挑战性的问题是,全球自我注意力非常昂贵,尤其是对于高分辨率的视觉任务。当地的自我注意力在局部区域内执行注意力计算以提高其效率,从而导致其在单个注意力层中的接受场不够大,从而导致上下文建模不足。观察场景时,人类通常集中在局部地区,同时在粗粒度下参加非注意区域。基于此观察,我们开发了轴向扩展的窗口自我发注意机制,该机制在当地窗口内进行精细颗粒的自我注意力,并在水平和垂直轴上进行粗粒度的自我注意,因此可以有效地捕获短距离和长距离视觉依赖性。
Recently, Transformers have shown promising performance in various vision tasks. A challenging issue in Transformer design is that global self-attention is very expensive to compute, especially for the high-resolution vision tasks. Local self-attention performs attention computation within a local region to improve its efficiency, which leads to their receptive fields in a single attention layer are not large enough, resulting in insufficient context modeling. When observing a scene, humans usually focus on a local region while attending to non-attentional regions at coarse granularity. Based on this observation, we develop the axially expanded window self-attention mechanism that performs fine-grained self-attention within the local window and coarse-grained self-attention in the horizontal and vertical axes, and thus can effectively capturing both short- and long-range visual dependencies.