论文标题
研究图像检索在视觉定位中的作用 - 详尽的基准测试
Investigating the Role of Image Retrieval for Visual Localization -- An exhaustive benchmark
论文作者
论文摘要
视觉定位,即在已知场景中摄像机姿势估计,是自动驾驶和增强现实等技术的核心组成部分。最先进的定位方法通常依赖于图像检索技术出于两个目的之一:(1)提供近似姿势估计,或(2)确定场景的哪些部分在给定的查询图像中可能可见。使用最先进的图像检索算法是两个常见的做法。这些算法通常经过训练,目的是在各种观点变化下检索相同地标,这通常与视觉定位的要求不同。为了研究视觉定位的后果,本文着重于理解图像检索对多个视觉定位范式的作用。首先,我们介绍了一种新颖的基准设置,并使用本地化性能作为度量进行比较在多个数据集上的最新检索表示。其次,我们研究了图像检索的“地面真相”的几个定义。使用这些定义作为视觉定位范式的上限,我们表明仍然有改进的空间。第三,使用这些工具和深入分析,我们表明,在经典地标检索或放置识别任务上的检索性能仅与某些但不适合本地化范围的范式相关。最后,我们分析了图像中模糊和动态场景的影响。我们得出的结论是,需要专门为定位范例设计的检索方法。我们的基准和评估协议可在https://github.com/naver/kapture-localization上获得。
Visual localization, i.e., camera pose estimation in a known scene, is a core component of technologies such as autonomous driving and augmented reality. State-of-the-art localization approaches often rely on image retrieval techniques for one of two purposes: (1) provide an approximate pose estimate or (2) determine which parts of the scene are potentially visible in a given query image. It is common practice to use state-of-the-art image retrieval algorithms for both of them. These algorithms are often trained for the goal of retrieving the same landmark under a large range of viewpoint changes which often differs from the requirements of visual localization. In order to investigate the consequences for visual localization, this paper focuses on understanding the role of image retrieval for multiple visual localization paradigms. First, we introduce a novel benchmark setup and compare state-of-the-art retrieval representations on multiple datasets using localization performance as metric. Second, we investigate several definitions of "ground truth" for image retrieval. Using these definitions as upper bounds for the visual localization paradigms, we show that there is still sgnificant room for improvement. Third, using these tools and in-depth analysis, we show that retrieval performance on classical landmark retrieval or place recognition tasks correlates only for some but not all paradigms to localization performance. Finally, we analyze the effects of blur and dynamic scenes in the images. We conclude that there is a need for retrieval approaches specifically designed for localization paradigms. Our benchmark and evaluation protocols are available at https://github.com/naver/kapture-localization.