使用混合动力动力学模型预测控制对学到的四足球运动策略的零射击重新定位

论文标题

使用混合动力动力学模型预测控制对学到的四足球运动策略的零射击重新定位

Zero-Shot Retargeting of Learned Quadruped Locomotion Policies Using Hybrid Kinodynamic Model Predictive Control

论文作者

Li, He, Zhang, Tingnan, Yu, Wenhao, Wensing, Patrick M.

论文摘要

强化学习（RL）见证了四足动物的大步进展，在可靠的SIM到现实政策转移方面持续进展。但是，重用另一个机器人的政策仍然是一个挑战，这可以节省重新培训的时间。在这项工作中，我们提出了一个用于零射击策略重新定位的框架，其中可以在不同形状和尺寸的机器人之间转移多种运动技能。新框架以系统整合RL和模型预测控制（MPC）的计划和控制管道为中心。计划阶段采用RL来生成动态合理的轨迹以及联系时间表，避免了接触序列优化的组合复杂性。然后，这些信息用于播种MPC，以通过新的混合运动动力学（HKD）模型稳定和鲁棒性地推出，该模型隐含地优化了立足点。硬件结果表明能够将政策从A1和Laikago机器人转移到MIT MIT MINI CHEETAH机器人，而无需重新调整政策。

Reinforcement Learning (RL) has witnessed great strides for quadruped locomotion, with continued progress in the reliable sim-to-real transfer of policies. However, it remains a challenge to reuse a policy on another robot, which could save time for retraining. In this work, we present a framework for zero-shot policy retargeting wherein diverse motor skills can be transferred between robots of different shapes and sizes. The new framework centers on a planning-and-control pipeline that systematically integrates RL and Model Predictive Control (MPC). The planning stage employs RL to generate a dynamically plausible trajectory as well as the contact schedule, avoiding the combinatorial complexity of contact sequence optimization. This information is then used to seed the MPC to stabilize and robustify the policy roll-out via a new Hybrid Kinodynamic (HKD) model that implicitly optimizes the foothold locations. Hardware results show an ability to transfer policies from both the A1 and Laikago robots to the MIT Mini Cheetah robot without requiring any policy re-tuning.

下载PDF全文

下载文献需遵守相关版权规定

论文标题