论文标题
部分可观测时空混沌系统的无模型预测
Few-View Object Reconstruction with Unknown Categories and Camera Poses
论文作者
论文摘要
近年来对象重建取得了长足的进步,但当前的方法通常需要密集捕获的图像和/或已知的相机姿势,并且概括为新颖的对象类别。为了迈向野外的对象重建,这项工作探讨了从几个没有已知相机姿势或对象类别的图像中重建一般现实世界对象。我们工作的症结在于以统一的方法解决两个基本的3D视力问题 - 形状重建和姿势估计。我们的方法捕获了这两个问题的协同作用:可靠的相机姿势估计会产生准确的形状重建,而准确的重建又可以引起不同视图和促进姿势估计之间的强大对应。我们的方法Forge可以从每个视图中预测3D特征,并与输入图像一起利用它们,以建立用于估计相机姿势的交叉视图对应关系。然后将3D特征通过估计的姿势转化为共享空间,并融合到神经辐射场中。重建结果是通过音量渲染技术渲染的,使我们能够在没有3D形状地面真相的情况下训练模型。我们的实验表明,Forge可靠地从五个视图中重建对象。我们的姿势估计方法优于现有方法的幅度很大。预测姿势下的重建结果与使用地面真相姿势的姿势相当。新颖测试类别的性能与训练过程中所见类别的结果相匹配。项目页面:https://ut-autin-rpl.github.io/forge/
While object reconstruction has made great strides in recent years, current methods typically require densely captured images and/or known camera poses, and generalize poorly to novel object categories. To step toward object reconstruction in the wild, this work explores reconstructing general real-world objects from a few images without known camera poses or object categories. The crux of our work is solving two fundamental 3D vision problems -- shape reconstruction and pose estimation -- in a unified approach. Our approach captures the synergies of these two problems: reliable camera pose estimation gives rise to accurate shape reconstruction, and the accurate reconstruction, in turn, induces robust correspondence between different views and facilitates pose estimation. Our method FORGE predicts 3D features from each view and leverages them in conjunction with the input images to establish cross-view correspondence for estimating relative camera poses. The 3D features are then transformed by the estimated poses into a shared space and are fused into a neural radiance field. The reconstruction results are rendered by volume rendering techniques, enabling us to train the model without 3D shape ground-truth. Our experiments show that FORGE reliably reconstructs objects from five views. Our pose estimation method outperforms existing ones by a large margin. The reconstruction results under predicted poses are comparable to the ones using ground-truth poses. The performance on novel testing categories matches the results on categories seen during training. Project page: https://ut-austin-rpl.github.io/FORGE/