变压器通过全局图像令牌压缩传感

论文标题

变压器通过全局图像令牌压缩传感

Transformer Compressed Sensing via Global Image Tokens

论文作者

Lorenzana, Marlon Bran, Engstrom, Craig, Chandra, Shekhar S.

论文摘要

与传统的手工制作方法相比，卷积神经网络（CNN）表现出出色的压缩感测（CS）性能。但是，它们在通用性，归纳偏差和难以建模长距离关系方面受到了广泛的限制。变压器神经网络（TNN）通过实施旨在捕获输入之间依赖性的注意机制来克服此类问题。但是，高分辨率任务通常要求视觉变压器（VIT）将图像分解为基于贴片的令牌，将输入限制为固有的本地环境。我们提出了一种新型图像分解，将图像自然嵌入到低分辨率输入中。这些万花筒令牌（KD）以与基于贴片的方法相同的计算成本提供了一种全球关注的机制。为了展示这一发展，我们用TNN块替换了众所周知的CS-MRI神经网络中的CNN组件，并证明了KD提供的改进。我们还提出了图像令牌的集合，从而提高了整体图像质量并降低了模型大小。可以提供补充材料：https：//github.com/uqmarlonbran/tcs.git

Convolutional neural networks (CNN) have demonstrated outstanding Compressed Sensing (CS) performance compared to traditional, hand-crafted methods. However, they are broadly limited in terms of generalisability, inductive bias and difficulty to model long distance relationships. Transformer neural networks (TNN) overcome such issues by implementing an attention mechanism designed to capture dependencies between inputs. However, high-resolution tasks typically require vision Transformers (ViT) to decompose an image into patch-based tokens, limiting inputs to inherently local contexts. We propose a novel image decomposition that naturally embeds images into low-resolution inputs. These Kaleidoscope tokens (KD) provide a mechanism for global attention, at the same computational cost as a patch-based approach. To showcase this development, we replace CNN components in a well-known CS-MRI neural network with TNN blocks and demonstrate the improvements afforded by KD. We also propose an ensemble of image tokens, which enhance overall image quality and reduces model size. Supplementary material is available: https://github.com/uqmarlonbran/TCS.git

下载PDF全文

下载文献需遵守相关版权规定

论文标题