FCN转换器功能融合用于息肉细分

论文标题

FCN转换器功能融合用于息肉细分

FCN-Transformer Feature Fusion for Polyp Segmentation

论文作者

Sanderson, Edward, Matuszewski, Bogdan J.

论文摘要

结肠镜检查被广泛认为是早期检测结直肠癌（CRC）的金标准程序。分割对于两种重要的临床应用，即病变检测和分类非常有价值，提供了提高准确性和鲁棒性的手段。结肠镜检查中息肉的手动分割是耗时的。结果，使用深度学习（DL）进行息肉的自动化变得很重要。但是，基于DL的解决方案可能容易受到过度拟合的影响，因此无法推广到不同结肠镜捕获的图像。最新的基于变压器的语义分割的体系结构既可以实现更高的性能，又比替代方案更好地概括，但是通常可以预测$ \ frac {h} {4} \ times \ times \ frac {w} {4} {4} $空间尺寸的$ h \ h \ times w $输入图像。为此，我们提出了一种用于全尺寸分割的新体系结构，该结构利用了变压器在主要分支中提取最重要的分割特征的优势，同时用二级全卷积分支全面预测其全尺寸预测的局限性。然后将两个分支的最终功能融合，以最终预测$ h \ times w $分段地图。我们在Kvasir-Seg和CVC-ClinicDB数据集基准上都证明了我们方法相对于MDICE，MIOU，MPRECISION和MRECALL METICS的最先进性能。此外，我们在每个数据集上训练模型，然后对另一个数据集进行评估以证明其出色的概括性能。

Colonoscopy is widely recognised as the gold standard procedure for the early detection of colorectal cancer (CRC). Segmentation is valuable for two significant clinical applications, namely lesion detection and classification, providing means to improve accuracy and robustness. The manual segmentation of polyps in colonoscopy images is time-consuming. As a result, the use of deep learning (DL) for automation of polyp segmentation has become important. However, DL-based solutions can be vulnerable to overfitting and the resulting inability to generalise to images captured by different colonoscopes. Recent transformer-based architectures for semantic segmentation both achieve higher performance and generalise better than alternatives, however typically predict a segmentation map of $\frac{h}{4}\times\frac{w}{4}$ spatial dimensions for a $h\times w$ input image. To this end, we propose a new architecture for full-size segmentation which leverages the strengths of a transformer in extracting the most important features for segmentation in a primary branch, while compensating for its limitations in full-size prediction with a secondary fully convolutional branch. The resulting features from both branches are then fused for final prediction of a $h\times w$ segmentation map. We demonstrate our method's state-of-the-art performance with respect to the mDice, mIoU, mPrecision, and mRecall metrics, on both the Kvasir-SEG and CVC-ClinicDB dataset benchmarks. Additionally, we train the model on each of these datasets and evaluate on the other to demonstrate its superior generalisation performance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题