论文标题
线索:机器人操纵的卷积三流异常识别框架
CLUE-AI: A Convolutional Three-stream Anomaly Identification Framework for Robot Manipulation
论文作者
论文摘要
近年来,机器人安全一直是一个著名的研究主题,因为机器人更多地参与了日常任务。设计所需的安全机制至关重要,以使服务机器人能够意识到并反应异常(即,在执行这些任务期间出现的意外偏差与预期结果的意外偏差)。这些异常的检测和识别是满足这些要求的重要一步。尽管提出了几种用于异常检测的体系结构,但尚未对识别进行彻底研究。此任务具有挑战性,因为指标可能很早就出现在检测到异常之前。在本文中,我们提出了一个卷积的三际流异常识别(Clue-AI)框架来解决此问题。该框架融合了视觉,听觉和本体感受性数据流,以识别日常对象操纵异常。通过放置在机器人头部的RGB-D相机收集的一系列2D图像在自我注意力中进行了处理,从而使视觉阶段捕获视觉异常指示器。在听觉阶段,在设计的卷积神经网络(CNN)中处理了放置在机器人下躯干上的麦克风提供的听觉方式。最后,在CNN内处理了抓地力和抓地状态的力,以获得本体感受特征。然后将这些输出与晚期融合方案结合使用。我们在半结构化环境中使用百特类人机器人的日常对象操纵任务分析了我们的新颖的三流框架设计。结果表明,该框架达到94%的F得分在运行时出现的异常时的其他基准的表现优于其他基线。
Robot safety has been a prominent research topic in recent years since robots are more involved in daily tasks. It is crucial to devise the required safety mechanisms to enable service robots to be aware of and react to anomalies (i.e., unexpected deviations from intended outcomes) that arise during the execution of these tasks. Detection and identification of these anomalies is an essential step towards fulfilling these requirements. Although several architectures are proposed for anomaly detection, identification is not yet thoroughly investigated. This task is challenging since indicators may appear long before anomalies are detected. In this paper, we propose a ConvoLUtional threE-stream Anomaly Identification (CLUE-AI) framework to address this problem. The framework fuses visual, auditory and proprioceptive data streams to identify everyday object manipulation anomalies. A stream of 2D images gathered through an RGB-D camera placed on the head of the robot is processed within a self-attention enabled visual stage to capture visual anomaly indicators. The auditory modality provided by the microphone placed on the robot's lower torso is processed within a designed convolutional neural network (CNN) in the auditory stage. Last, the force applied by the gripper and the gripper state are processed within a CNN to obtain proprioceptive features. These outputs are then combined with a late fusion scheme. Our novel three-stream framework design is analyzed on everyday object manipulation tasks with a Baxter humanoid robot in a semi-structured setting. The results indicate that the framework achieves an f-score of 94% outperforming the other baselines in classifying anomalies that arise during runtime.