结构性先验驱动的正规化深度学习声纳图像分类

论文标题

结构性先验驱动的正规化深度学习声纳图像分类

Structural Prior Driven Regularized Deep Learning for Sonar Image Classification

论文作者

Gerg, Isaac D., Monga, Vishal

论文摘要

最近已经显示出深度学习可以改善合成孔径声纳（SAS）图像分类领域的性能。鉴于SAS范围不断分辨率，深度学习技术的表现如此出色也就不足为奇了。尽管学习最近取得了深刻的成功，但在降低较高的错误警报率并在训练图像有限的情况下取得成功方面仍然存在着令人信服的开放挑战，这是一个实用的挑战，它将SAS分类问题与标准图像分类设置区分开来，在这些挑战中，培训图像可能很丰富。我们通过利用人类用来掌握现场的先验知识来解决这些挑战。这些包括无意识消除图像斑点和场景中对象的定位。我们介绍了一种新的深度学习体系结构，该体系结构结合了这些先验，以改善SAS图像的自动目标识别（ATR）。我们的建议（称为SPDRDL，结构性的先验驱动的正规化深度学习）将前面提到的先验纳入了多任务卷积神经网络（CNN），与传统的SAS ATR方法相比，不需要其他培训数据。在网络学习中，通过正规化术语来执行两个结构先验：（1）结构相似性先验 - 增强的图像（通常是通过幻想）有助于人类的解释，并且在语义上与原始图像相似，并且（2）结构场景上下文先验 - 学识渊博的特征理想地封装了目标目标中心信息；因此，可以通过正规化来增强学习，从而鼓励对已知地面真相目标变化（现场中心的相对目标位置）的忠诚。在具有挑战性的现实世界数据集上进行的实验表明，SPDRDL优于最先进的深度学习和其他用于SAS图像分类的竞争方法。

Deep learning has been recently shown to improve performance in the domain of synthetic aperture sonar (SAS) image classification. Given the constant resolution with range of a SAS, it is no surprise that deep learning techniques perform so well. Despite deep learning's recent success, there are still compelling open challenges in reducing the high false alarm rate and enabling success when training imagery is limited, which is a practical challenge that distinguishes the SAS classification problem from standard image classification set-ups where training imagery may be abundant. We address these challenges by exploiting prior knowledge that humans use to grasp the scene. These include unconscious elimination of the image speckle and localization of objects in the scene. We introduce a new deep learning architecture which incorporates these priors with the goal of improving automatic target recognition (ATR) from SAS imagery. Our proposal -- called SPDRDL, Structural Prior Driven Regularized Deep Learning -- incorporates the previously mentioned priors in a multi-task convolutional neural network (CNN) and requires no additional training data when compared to traditional SAS ATR methods. Two structural priors are enforced via regularization terms in the learning of the network: (1) structural similarity prior -- enhanced imagery (often through despeckling) aids human interpretation and is semantically similar to the original imagery and (2) structural scene context priors -- learned features ideally encapsulate target centering information; hence learning may be enhanced via a regularization that encourages fidelity against known ground truth target shifts (relative target position from scene center). Experiments on a challenging real-world dataset reveal that SPDRDL outperforms state-of-the-art deep learning and other competing methods for SAS image classification.

下载PDF全文

下载文献需遵守相关版权规定

论文标题