论文标题

幼儿指导学习:关键时期对多模式AI代理的影响

Toddler-Guidance Learning: Impacts of Critical Period on Multimodal AI Agents

论文作者

Park, Junseok, Park, Kwanyoung, Oh, Hyunseok, Lee, Ganghun, Lee, Minsu, Lee, Youngki, Zhang, Byoung-Tak

论文摘要

关键时期是蹒跚学步的大脑在突变中发展的阶段。为了促进儿童的认知发展,在此阶段至关重要的指导至关重要。但是,目前尚不清楚对AI代理的培训是否也存在这样的关键时期。与人类小孩类似,定时的指导和多模式相互作用也可能会显着提高AI代理的训练效率。为了验证这一假设,我们将关键时期的这一概念调整为在AI代理中学习,并研究AI代理的虚拟环境中的关键时期。我们在强化学习(RL)框架中正式化了关键时期和幼儿指导学习。然后,我们建立了一个具有Veca Toolkit的幼儿环境,以模仿人类幼儿的学习特征。我们研究了三个离散级别的相互作用:弱体指导(稀疏奖励),中度导师指导(辅助奖励)和导师示范(行为克隆)。我们还介绍了由30,000张现实世界图像组成的Eave数据集,以完全反映幼儿的观点。我们从两个角度评估了关键时期对AI代理的影响:如何以及何时在单模式学习中最能指导它们。我们的实验结果表明,在100万和200万培训步骤上具有适度指导指导和关键时期的单模式和多模式的代理都显示出明显的改善。我们通过在EVAVE数据集上的转移学习来验证这些结果,并在同一关键时期和指导上找到绩效的进步。

Critical periods are phases during which a toddler's brain develops in spurts. To promote children's cognitive development, proper guidance is critical in this stage. However, it is not clear whether such a critical period also exists for the training of AI agents. Similar to human toddlers, well-timed guidance and multimodal interactions might significantly enhance the training efficiency of AI agents as well. To validate this hypothesis, we adapt this notion of critical periods to learning in AI agents and investigate the critical period in the virtual environment for AI agents. We formalize the critical period and Toddler-guidance learning in the reinforcement learning (RL) framework. Then, we built up a toddler-like environment with VECA toolkit to mimic human toddlers' learning characteristics. We study three discrete levels of mutual interaction: weak-mentor guidance (sparse reward), moderate mentor guidance (helper-reward), and mentor demonstration (behavioral cloning). We also introduce the EAVE dataset consisting of 30,000 real-world images to fully reflect the toddler's viewpoint. We evaluate the impact of critical periods on AI agents from two perspectives: how and when they are guided best in both uni- and multimodal learning. Our experimental results show that both uni- and multimodal agents with moderate mentor guidance and critical period on 1 million and 2 million training steps show a noticeable improvement. We validate these results with transfer learning on the EAVE dataset and find the performance advancement on the same critical period and the guidance.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源