黑盒AI代理的差异评估

论文标题

黑盒AI代理的差异评估

Differential Assessment of Black-Box AI Agents

论文作者

Nayyar, Rashmeet Kaur, Verma, Pulkit, Srivastava, Siddharth

论文摘要

关于AI代理的学习符号模型的许多研究都集中在具有固定模型的代理上。该假设无法在因学习，适应或其他部署后修改而导致的代理能力可能会发生变化的设置中。在这种情况下对代理的有效评估对于学习AI系统的真正功能并确保其安全使用至关重要。在这项工作中，我们提出了一种新颖的方法，以“差异”评估从以前已知的模型中流失的黑盒AI代理。作为起点，我们考虑了完全可观察到的确定性设置。我们利用对漂移代理的当前行为和其初始模型的知识的稀疏观察来生成一个主动查询策略，该策略有选择地查询代理并计算其功能的更新模型。经验评估表明，我们的方法比从头开始重新学习代理模型要高得多。我们还表明，使用我们方法的差异评估的成本与代理功能的漂移数量成正比。

Much of the research on learning symbolic models of AI agents focuses on agents with stationary models. This assumption fails to hold in settings where the agent's capabilities may change as a result of learning, adaptation, or other post-deployment modifications. Efficient assessment of agents in such settings is critical for learning the true capabilities of an AI system and for ensuring its safe usage. In this work, we propose a novel approach to "differentially" assess black-box AI agents that have drifted from their previously known models. As a starting point, we consider the fully observable and deterministic setting. We leverage sparse observations of the drifted agent's current behavior and knowledge of its initial model to generate an active querying policy that selectively queries the agent and computes an updated model of its functionality. Empirical evaluation shows that our approach is much more efficient than re-learning the agent model from scratch. We also show that the cost of differential assessment using our method is proportional to the amount of drift in the agent's functionality.

下载PDF全文

下载文献需遵守相关版权规定

论文标题