论文标题
HAA500:带有策划视频的以人为中心的原子能动作数据集
HAA500: Human-Centric Atomic Action Dataset with Curated Videos
论文作者
论文摘要
我们贡献了HAA500,这是一种手动注释的以人为中心的原子能动作数据集,可在500个类别具有超过591K标签框架的500个类别上进行动作识别。为了最大程度地减少行动分类中的歧义,HAA500由高度多样化的细粒原子行动组成,其中只有一致的动作属于同一标签,例如“棒球投球”与“自由投掷篮球”。因此,HAA500与现有的原子作用数据集不同,该数据集用粗糙的原子作用标记了粗糙的动作剂,例如“抛出”。 HAA500经过精心策划,以捕获人物的精确运动,而几乎没有类似的运动或时空标签的噪声。 HAA500的优点是四重:1)相关人类姿势的平均可检测到的关节为69.7%的行为; 2)高可扩展性,因为添加新类可以在20-60分钟以下完成; 3)策划的视频,捕获没有无关框架的原子动作的基本要素; 4)细粒度的原子动作类。我们的广泛实验包括使用野外收集的数据集进行跨数据验证,证明了HAA500的以人为中心和原子特征的明显好处,这甚至可以培训基线深度学习模型,以通过参与原子人类姿势来改善预测。我们详细介绍了HAA500数据集统计和收集方法,并与现有的动作识别数据集进行了定量比较。
We contribute HAA500, a manually annotated human-centric atomic action dataset for action recognition on 500 classes with over 591K labeled frames. To minimize ambiguities in action classification, HAA500 consists of highly diversified classes of fine-grained atomic actions, where only consistent actions fall under the same label, e.g., "Baseball Pitching" vs "Free Throw in Basketball". Thus HAA500 is different from existing atomic action datasets, where coarse-grained atomic actions were labeled with coarse action-verbs such as "Throw". HAA500 has been carefully curated to capture the precise movement of human figures with little class-irrelevant motions or spatio-temporal label noises. The advantages of HAA500 are fourfold: 1) human-centric actions with a high average of 69.7% detectable joints for the relevant human poses; 2) high scalability since adding a new class can be done under 20-60 minutes; 3) curated videos capturing essential elements of an atomic action without irrelevant frames; 4) fine-grained atomic action classes. Our extensive experiments including cross-data validation using datasets collected in the wild demonstrate the clear benefits of human-centric and atomic characteristics of HAA500, which enable training even a baseline deep learning model to improve prediction by attending to atomic human poses. We detail the HAA500 dataset statistics and collection methodology and compare quantitatively with existing action recognition datasets.