论文标题
S $^2 $ -FPN:尺度钢铁引导引导功能功能金字塔网络用于实时语义分段
S$^2$-FPN: Scale-ware Strip Attention Guided Feature Pyramid Network for Real-time Semantic Segmentation
论文作者
论文摘要
现代的高性能语义分割方法采用沉重的主链和扩张的卷积来提取相关特征。尽管使用上下文和语义信息提取功能对于分割任务至关重要,但它为实时应用程序带来了内存足迹和高计算成本。本文提出了一种新模型,以实现实时道路场景语义细分的准确性/速度之间的权衡。具体而言,我们提出了一个名为“比例吸引的脱衣舞引导特征金字塔网络”(S $^2 $ -FPN)的轻巧模型。我们的网络由三个主要模块组成:注意金字塔融合(APF)模块,比例吸引的条带注意模块(SSAM)和全局特征Upsample(GFU)模块。 APF采用了注意力机制来学习歧视性的多尺度特征,并有助于缩小不同级别之间的语义差距。 APF使用比例意识的关注来用垂直剥离操作编码全局上下文,并建模远程依赖性,这有助于将像素与类似的语义标签相关联。此外,APF还采用频道重新加权块(CRB)来强调频道功能。最后,S $^2 $ -FPN的解码器随后采用了GFU,该GFU用于融合APF和编码器的功能。已经对两个具有挑战性的语义分割基准进行了广泛的实验,这表明我们的方法通过不同的模型设置实现了更好的准确性/速度权衡。提出的模型的结果达到了76.2 \%miou/87.3fps,77.4 \%miou/67fps,在CityScapes数据集上的结果为77.8 \%miou/30.5fps,以及69.6 \%miou,71.0 miou,71.0 \%miou,71.0 \%miou,和74.2 \%mioun和74.2 \%miou in。这项工作的代码将在\ url {https://github.com/mohamedac29/s2-fpn提供。
Modern high-performance semantic segmentation methods employ a heavy backbone and dilated convolution to extract the relevant feature. Although extracting features with both contextual and semantic information is critical for the segmentation tasks, it brings a memory footprint and high computation cost for real-time applications. This paper presents a new model to achieve a trade-off between accuracy/speed for real-time road scene semantic segmentation. Specifically, we proposed a lightweight model named Scale-aware Strip Attention Guided Feature Pyramid Network (S$^2$-FPN). Our network consists of three main modules: Attention Pyramid Fusion (APF) module, Scale-aware Strip Attention Module (SSAM), and Global Feature Upsample (GFU) module. APF adopts an attention mechanisms to learn discriminative multi-scale features and help close the semantic gap between different levels. APF uses the scale-aware attention to encode global context with vertical stripping operation and models the long-range dependencies, which helps relate pixels with similar semantic label. In addition, APF employs channel-wise reweighting block (CRB) to emphasize the channel features. Finally, the decoder of S$^2$-FPN then adopts GFU, which is used to fuse features from APF and the encoder. Extensive experiments have been conducted on two challenging semantic segmentation benchmarks, which demonstrate that our approach achieves better accuracy/speed trade-off with different model settings. The proposed models have achieved a results of 76.2\%mIoU/87.3FPS, 77.4\%mIoU/67FPS, and 77.8\%mIoU/30.5FPS on Cityscapes dataset, and 69.6\%mIoU,71.0\% mIoU, and 74.2\% mIoU on Camvid dataset. The code for this work will be made available at \url{https://github.com/mohamedac29/S2-FPN