论文标题
有多少观察足够?轨迹预测的知识蒸馏
How many Observations are Enough? Knowledge Distillation for Trajectory Forecasting
论文作者
论文摘要
对未来人类职位的准确预测是现代视频效能系统的重要任务。当前的最新模型通常依赖于过去跟踪位置的“历史”(例如3到5秒)来预测未来位置的合理序列(例如,直到接下来的5秒钟)。我们认为,这种常见的模式忽略了现实应用的关键特征:随着输入轨迹的收集涉及机器感知(即检测和跟踪),错误的检测和碎片错误可能会在拥挤的场景中积累,从而导致跟踪漂移。因此,该模型将被赋予损坏和嘈杂的输入数据,从而致命地影响其预测性能。 在这方面,当仅使用少数输入观测值时,我们专注于提供准确的预测,从而有可能降低与自动感知相关的风险。为此,我们构想了一种新颖的蒸馏策略,可以将知识从教师网络转移到学生,后者的观察更少(只有两个)。我们表明,正确定义的教师监督允许学生网络与需要更多观察结果的最先进方法相比执行。此外,关于常见轨迹预测数据集的广泛实验强调,我们的学生网络可以更好地概括地看不见的情况。
Accurate prediction of future human positions is an essential task for modern video-surveillance systems. Current state-of-the-art models usually rely on a "history" of past tracked locations (e.g., 3 to 5 seconds) to predict a plausible sequence of future locations (e.g., up to the next 5 seconds). We feel that this common schema neglects critical traits of realistic applications: as the collection of input trajectories involves machine perception (i.e., detection and tracking), incorrect detection and fragmentation errors may accumulate in crowded scenes, leading to tracking drifts. On this account, the model would be fed with corrupted and noisy input data, thus fatally affecting its prediction performance. In this regard, we focus on delivering accurate predictions when only few input observations are used, thus potentially lowering the risks associated with automatic perception. To this end, we conceive a novel distillation strategy that allows a knowledge transfer from a teacher network to a student one, the latter fed with fewer observations (just two ones). We show that a properly defined teacher supervision allows a student network to perform comparably to state-of-the-art approaches that demand more observations. Besides, extensive experiments on common trajectory forecasting datasets highlight that our student network better generalizes to unseen scenarios.