论文标题
在机器人辅助手术中使用变压器模型对手术手势和轨迹的识别和预测
Recognition and Prediction of Surgical Gestures and Trajectories Using Transformer Models in Robot-Assisted Surgery
论文作者
论文摘要
手术活动识别和预测可以帮助提供许多机器人辅助手术(RAS)应用中的重要背景,例如,手术进度监测和估计,手术技能评估以及在远程操作过程中共享控制策略。最初为自然语言处理(NLP)开发了变压器模型,以建模单词序列,并很快该方法在通用序列建模任务中获得了普及。在本文中,我们建议将变压器模型用于三个任务的新颖使用:手势识别,手势预测和RAS期间的轨迹预测。我们修改原始变压器体系结构,以便仅使用手术机器人终端效果的当前运动学数据生成当前的手势序列,未来的手势序列和未来轨迹序列估计。我们评估了有关JHU-ISI手势和技能评估工作集(Jigsaws)的拟议模型,并使用剩余的用户(Louo)交叉验证来确保结果的普遍性。我们的模型达到了高达89.3 \%的手势识别精度,84.6 \%的手势预测准确性(前1秒)和2.71mm轨迹预测误差(前1秒)。我们的模型与仅使用运动学数据通道时相当,并且能够超越最先进的方法。这种方法可以实现近乎真实的时间手术活动识别和预测。
Surgical activity recognition and prediction can help provide important context in many Robot-Assisted Surgery (RAS) applications, for example, surgical progress monitoring and estimation, surgical skill evaluation, and shared control strategies during teleoperation. Transformer models were first developed for Natural Language Processing (NLP) to model word sequences and soon the method gained popularity for general sequence modeling tasks. In this paper, we propose the novel use of a Transformer model for three tasks: gesture recognition, gesture prediction, and trajectory prediction during RAS. We modify the original Transformer architecture to be able to generate the current gesture sequence, future gesture sequence, and future trajectory sequence estimations using only the current kinematic data of the surgical robot end-effectors. We evaluate our proposed models on the JHU-ISI Gesture and Skill Assessment Working Set (JIGSAWS) and use Leave-One-User-Out (LOUO) cross-validation to ensure the generalizability of our results. Our models achieve up to 89.3\% gesture recognition accuracy, 84.6\% gesture prediction accuracy (1 second ahead) and 2.71mm trajectory prediction error (1 second ahead). Our models are comparable to and able to outperform state-of-the-art methods while using only the kinematic data channel. This approach can enable near-real time surgical activity recognition and prediction.