论文标题
原型对比度和反向预测:基于无监督的骨骼识别
Prototypical Contrast and Reverse Prediction: Unsupervised Skeleton Based Action Recognition
论文作者
论文摘要
在本文中,我们专注于基于骨架的动作识别的无监督的表示。现有方法通常通过顺序预测来学习行动表示,但他们无法完全学习语义信息。为了解决这个限制,我们提出了一个名为典型的对比和反向预测(PCRP)的新颖框架,该框架不仅创建了相反的顺序预测,以学习低级信息(例如,每个帧处于每个帧的身体姿势)和高级模式和高级模式(例如,运动序)(例如,运动序),而且还将动作原型的动作原型分配到了序列上的相似性相似性。通常,我们将动作原型视为潜在变量,并将PCRP作为期望最大化任务。具体而言,PCRP迭代(1)e-step通过从编码器编码编码的动作来确定原型的分布,并且(2)m-step通过最大程度地减少了所提出的Protoma丢失来优化编码器,这有助于同时将编码的动作仔细编码到其分配的原型和反向预测任务。关于N-UCLA,NTU 60和NTU 120数据集的广泛实验表明,PCRP优于最先进的无监督方法,甚至比某些监督方法相比,实现了卓越的性能。代码可在https://github.com/mikexu007/pcrp上找到。
In this paper, we focus on unsupervised representation learning for skeleton-based action recognition. Existing approaches usually learn action representations by sequential prediction but they suffer from the inability to fully learn semantic information. To address this limitation, we propose a novel framework named Prototypical Contrast and Reverse Prediction (PCRP), which not only creates reverse sequential prediction to learn low-level information (e.g., body posture at every frame) and high-level pattern (e.g., motion order), but also devises action prototypes to implicitly encode semantic similarity shared among sequences. In general, we regard action prototypes as latent variables and formulate PCRP as an expectation-maximization task. Specifically, PCRP iteratively runs (1) E-step as determining the distribution of prototypes by clustering action encoding from the encoder, and (2) M-step as optimizing the encoder by minimizing the proposed ProtoMAE loss, which helps simultaneously pull the action encoding closer to its assigned prototype and perform reverse prediction task. Extensive experiments on N-UCLA, NTU 60, and NTU 120 dataset present that PCRP outperforms state-of-the-art unsupervised methods and even achieves superior performance over some of supervised methods. Codes are available at https://github.com/Mikexu007/PCRP.