部分可观测时空混沌系统的无模型预测

论文标题

部分可观测时空混沌系统的无模型预测

DroneAttention: Sparse Weighted Temporal Attention for Drone-Camera Based Activity Recognition

论文作者

Yadav, Santosh Kumar, Luthra, Achleshwar, Pahwa, Esha, Tiwari, Kamlesh, Rathore, Heena, Pandey, Hari Mohan, Corcoran, Peter

论文摘要

近年来，使用无人机安装相机的人类活动认可（HAR）引起了计算机视觉研究社区的极大兴趣。强大而有效的HAR系统在视频监视，人群行为分析，体育分析和人类计算机互动等领域具有关键作用。使它具有挑战性的是复杂的姿势，了解不同的观点以及动作发生的环境情景。为了解决这种复杂性，在本文中，我们提出了一种新型的稀疏加权时间注意（SWTA）模块，以利用稀疏采样的视频框架来获得全球加权的时间关注。拟议的SWTA由两个部分组成。首先，稀疏采样一组帧的时间段网络。其次，加权的时间关注，其中包含了来自原始RGB图像的光流的注意力图的融合。接下来是一个Basenet网络，该网络包括卷积神经网络（CNN）模块以及为我们提供活动识别的完全连接层。 SWTA网络可以用作现有Deep CNN体系结构的插件模块，以优化它们来通过消除单独的时间流的需求来学习时间信息。已经对三个公开可用的基准数据集进行了评估，即Okutama，Mod20和Drone-Action。所提出的模型的准确度为72.76％，92.56％和78.86％的数据集，因此超过了先前的最先进性能，分别超过了25.26％，18.56％和2.94％的利润率。

Human activity recognition (HAR) using drone-mounted cameras has attracted considerable interest from the computer vision research community in recent years. A robust and efficient HAR system has a pivotal role in fields like video surveillance, crowd behavior analysis, sports analysis, and human-computer interaction. What makes it challenging are the complex poses, understanding different viewpoints, and the environmental scenarios where the action is taking place. To address such complexities, in this paper, we propose a novel Sparse Weighted Temporal Attention (SWTA) module to utilize sparsely sampled video frames for obtaining global weighted temporal attention. The proposed SWTA is comprised of two parts. First, temporal segment network that sparsely samples a given set of frames. Second, weighted temporal attention, which incorporates a fusion of attention maps derived from optical flow, with raw RGB images. This is followed by a basenet network, which comprises a convolutional neural network (CNN) module along with fully connected layers that provide us with activity recognition. The SWTA network can be used as a plug-in module to the existing deep CNN architectures, for optimizing them to learn temporal information by eliminating the need for a separate temporal stream. It has been evaluated on three publicly available benchmark datasets, namely Okutama, MOD20, and Drone-Action. The proposed model has received an accuracy of 72.76%, 92.56%, and 78.86% on the respective datasets thereby surpassing the previous state-of-the-art performances by a margin of 25.26%, 18.56%, and 2.94%, respectively.

下载PDF全文

下载文献需遵守相关版权规定

论文标题