论文标题
TOV:光学遥感图像理解的原始视觉模型通过自我监督学习
TOV: The Original Vision Model for Optical Remote Sensing Image Understanding via Self-supervised Learning
论文作者
论文摘要
我们是否采用正确的方法,通过监督数据依赖数据和任务依赖于任务的方式,而不是以无标签和与任务无关的方式进行培训模型,以正确的方式进行遥感图像理解(RSIU)?我们认为,应从数据中训练一个更理想的RSIU模型,而是通过数据进行固有结构的训练,即外在的人类标签以实现各种RSIU任务的普遍性。根据这一假设,我们提出了\ textbf {t} he \ textbf {o}在遥感中提交的iSion model(tov)rigninal \ textbf {v}。从一般知识到专业知识的人类般的自学学习(SSL)途径(SSL)途径训练,可以轻松地适应各种RSIU任务,包括场景分类,对象检测和语义段和语义段,以及在最近的预期拟议的方法上均超过了拟定的方法,并胜过了两种方法,并胜过了两种方法,以及两种拟议的方法,以及两种拟定的方法,并享有两种方法。此外,我们分析了两个关键因素对RSIU构建TOV模型的性能的影响,包括使用不同的数据采样方法的影响以及在自我监督优化期间选择学习路径的选择。我们认为,经过无标签和与任务无关的方式训练的一般模型可能是RSIU的下一个范式,并希望从这项研究中提取的洞察力可以帮助促进RSIU的原始视觉模型的开发。
Do we on the right way for remote sensing image understanding (RSIU) by training models via supervised data-dependent and task-dependent way, instead of human vision in a label-free and task-independent way? We argue that a more desirable RSIU model should be trained with intrinsic structure from data rather that extrinsic human labels to realize generalizability across a wide range of RSIU tasks. According to this hypothesis, we proposed \textbf{T}he \textbf{O}riginal \textbf{V}ision model (TOV) in remote sensing filed. Trained by massive unlabeled optical data along a human-like self-supervised learning (SSL) path that is from general knowledge to specialized knowledge, TOV model can be easily adapted to various RSIU tasks, including scene classification, object detection, and semantic segmentation, and outperforms dominant ImageNet supervised pretrained method as well as two recently proposed SSL pretrained methods on majority of 12 publicly available benchmarks. Moreover, we analyze the influences of two key factors on the performance of building TOV model for RSIU, including the influence of using different data sampling methods and the selection of learning paths during self-supervised optimization. We believe that a general model which is trained by a label-free and task-independent way may be the next paradigm for RSIU and hope the insights distilled from this study can help to foster the development of an original vision model for RSIU.