语义蒸馏引导的显着对象检测

论文标题

语义蒸馏引导的显着对象检测

Semantic Distillation Guided Salient Object Detection

论文作者

Xu, Bo, Liu, Guanze, Huang, Han, Lu, Cheng, Guo, Yandong

论文摘要

大多数现有的基于CNN的显着对象检测方法都可以识别局部细分细节，例如头发和动物皮毛，但由于缺乏SOD任务和卷积层的位置引起的全球上下文信息，因此通常会误解了真正的显着性。此外，由于不切实际的标签成本，当前现有的SOD数据集不足以覆盖真实的数据分布。训练数据的限制和偏见增加了探索给定图像中对象对象与对象环境之间语义关联的额外困难。在本文中，我们提出了一种语义蒸馏引导的SOD（SDG-SOD）方法，该方法通过将生成的图像字幕字幕融合到基于视觉转换器的SOD框架中，从而产生准确的结果。可持续发展目标可以更好地发现对象间和对象之间的显着性，并涵盖SOD的主观性质与其昂贵的标签之间的差距。五个基准数据集的全面实验表明，SDG-SOD在四个评估指标上的最先进方法优于最先进的方法，并且在很大程度上改善了DUTS，ECSSD，DUT，HKU-IS和PASCAL-S数据集的模型性能。

Most existing CNN-based salient object detection methods can identify local segmentation details like hair and animal fur, but often misinterpret the real saliency due to the lack of global contextual information caused by the subjectiveness of the SOD task and the locality of convolution layers. Moreover, due to the unrealistically expensive labeling costs, the current existing SOD datasets are insufficient to cover the real data distribution. The limitation and bias of the training data add additional difficulty to fully exploring the semantic association between object-to-object and object-to-environment in a given image. In this paper, we propose a semantic distillation guided SOD (SDG-SOD) method that produces accurate results by fusing semantically distilled knowledge from generated image captioning into the Vision-Transformer-based SOD framework. SDG-SOD can better uncover inter-objects and object-to-environment saliency and cover the gap between the subjective nature of SOD and its expensive labeling. Comprehensive experiments on five benchmark datasets demonstrate that the SDG-SOD outperforms the state-of-the-art approaches on four evaluation metrics, and largely improves the model performance on DUTS, ECSSD, DUT, HKU-IS, and PASCAL-S datasets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题