从单个图像中定位相机

论文标题

从单个图像中定位相机

On Localizing a Camera from a Single Image

论文作者

Ghosh, Pradipta, Liu, Xiaochen, Qiu, Hang, Vieira, Marcos A. M., Sukhatme, Gaurav S., Govindan, Ramesh

论文摘要

公共摄像机通常有限地描述其属性。关键缺失属性是相机的确切位置，可以精确地指定相机中看到的事件的位置。在本文中，我们探讨了以下问题：在什么条件下，可以从相机拍摄的单个图像中估算相机的位置？我们表明，使用投影性几何形状，神经网络和人类工人的众包注释的明智组合，可以将测试数据集中的95％图像定位在12 m之内。该性能比Posenet好两个数量级，Posenet是一个最先进的神经网络，当在某个区域中的大量图像训练时，可以估计单个图像的姿势。最后，我们表明相机的推断位置和内在参数可以帮助设计许多虚拟传感器，所有传感器都是相当准确的。

Public cameras often have limited metadata describing their attributes. A key missing attribute is the precise location of the camera, using which it is possible to precisely pinpoint the location of events seen in the camera. In this paper, we explore the following question: under what conditions is it possible to estimate the location of a camera from a single image taken by the camera? We show that, using a judicious combination of projective geometry, neural networks, and crowd-sourced annotations from human workers, it is possible to position 95% of the images in our test data set to within 12 m. This performance is two orders of magnitude better than PoseNet, a state-of-the-art neural network that, when trained on a large corpus of images in an area, can estimate the pose of a single image. Finally, we show that the camera's inferred position and intrinsic parameters can help design a number of virtual sensors, all of which are reasonably accurate.

下载PDF全文

下载文献需遵守相关版权规定

论文标题