朝着基于视频的端到端视频跟踪

论文标题

朝着基于视频的端到端视频跟踪

Towards End-to-end Video-based Eye-Tracking

论文作者

Park, Seonwook, Aksan, Emre, Zhang, Xucong, Hilliges, Otmar

论文摘要

仅从图像中估算眼睛的目光是一项具有挑战性的任务，在大部分方面，由于无法观察到的人特异性因素。达到高精度通常需要从测试用户标记的数据，这些数据可能在实际应用中无法实现。我们观察到用户正在寻找的东西与用户眼睛的外观之间存在牢固的关系。为了回应这种理解，我们提出了一种新颖的数据集和随附的方法，旨在明确学习这些语义和时间关系。我们的视频数据集由时间同步的屏幕录制，面向用户的摄像头视图和眼睛凝视数据组成，该录像带允许在时间凝视跟踪中进行新的基准测试以及无标签的凝视。重要的是，我们证明，来自视觉刺激和眼睛图像的信息融合会导致实现类似于通过监督个性化获得的文献报告的人物类似的性能。我们的最终方法对我们提出的EVE数据集产生了显着的性能提高，目光估算的估计值高达28％（导致角度误差的2.49度），铺平了通往基于高尺度屏幕的基于摄像头传感器的高速屏幕屏幕跟踪的路径。数据集和参考源代码可在https://ait.ethz.ch/projects/2020/eve上获得

Estimating eye-gaze from images alone is a challenging task, in large parts due to un-observable person-specific factors. Achieving high accuracy typically requires labeled data from test users which may not be attainable in real applications. We observe that there exists a strong relationship between what users are looking at and the appearance of the user's eyes. In response to this understanding, we propose a novel dataset and accompanying method which aims to explicitly learn these semantic and temporal relationships. Our video dataset consists of time-synchronized screen recordings, user-facing camera views, and eye gaze data, which allows for new benchmarks in temporal gaze tracking as well as label-free refinement of gaze. Importantly, we demonstrate that the fusion of information from visual stimuli as well as eye images can lead towards achieving performance similar to literature-reported figures acquired through supervised personalization. Our final method yields significant performance improvements on our proposed EVE dataset, with up to a 28 percent improvement in Point-of-Gaze estimates (resulting in 2.49 degrees in angular error), paving the path towards high-accuracy screen-based eye tracking purely from webcam sensors. The dataset and reference source code are available at https://ait.ethz.ch/projects/2020/EVE

下载PDF全文

下载文献需遵守相关版权规定

论文标题