UMETRACK：VR的统一多视图端到端跟踪

论文标题

UMETRACK：VR的统一多视图端到端跟踪

UmeTrack: Unified multi-view end-to-end hand tracking for VR

论文作者

Han, Shangchen, Wu, Po-chen, Zhang, Yubo, Liu, Beibei, Zhang, Linguang, Wang, Zheng, Si, Weiguang, Zhang, Peizhao, Cai, Yujun, Hodan, Tomas, Cabezas, Randi, Tran, Luan, Akbay, Muzaffer, Yu, Tsz-Ho, Keskin, Cem, Wang, Robert

论文摘要

3D手姿势在世界空间中的实时跟踪是一个具有挑战性的问题，在VR互动中起着重要作用。在这个空间中的现有工作仅限于产生扎根（与世界空间与世界空间）3D姿势或依赖多个阶段，例如生成热图和运动学优化以获得3D姿势。此外，这些方法很少解决典型的VR方案，该方案涉及从宽\ ac {fov}摄像机进行的多视图跟踪。在本文中，我们提出了一个统一的端到端可区分框架，用于多视图，多帧手跟踪，该框架直接预测了世界空间中的3D手姿势。我们通过通过诸如抖动减少和捏捏预测之类的下游任务扩展框架来证明端到端不同必须如此的好处。为了证明我们的模型的功效，我们进一步提出了一个由真实数据和合成数据组成的新的大规模中心姿势数据集。实验表明，我们在此数据集上培训的系统处理各种具有挑战性的交互作用，并已成功应用于实时VR应用程序。

Real-time tracking of 3D hand pose in world space is a challenging problem and plays an important role in VR interaction. Existing work in this space are limited to either producing root-relative (versus world space) 3D pose or rely on multiple stages such as generating heatmaps and kinematic optimization to obtain 3D pose. Moreover, the typical VR scenario, which involves multi-view tracking from wide \ac{fov} cameras is seldom addressed by these methods. In this paper, we present a unified end-to-end differentiable framework for multi-view, multi-frame hand tracking that directly predicts 3D hand pose in world space. We demonstrate the benefits of end-to-end differentiabilty by extending our framework with downstream tasks such as jitter reduction and pinch prediction. To demonstrate the efficacy of our model, we further present a new large-scale egocentric hand pose dataset that consists of both real and synthetic data. Experiments show that our system trained on this dataset handles various challenging interactive motions, and has been successfully applied to real-time VR applications.

下载PDF全文

下载文献需遵守相关版权规定

论文标题