论文标题
在结构化生物医学领域实现预测不变性
Enforcing Predictive Invariance across Structured Biomedical Domains
论文作者
论文摘要
许多生化应用(例如分子财产预测)需要模型以超越其训练领域(环境)。此外,这些任务中的自然环境是结构化的,由复杂描述符(例如分子支架或蛋白质家族)定义。因此,大多数环境在训练过程中永远不会看到,或者仅包含一个训练示例。为了应对这些挑战,我们提出了一种新的遗憾最小化(RGM)算法及其针对结构化环境的扩展。 RGM通过在预测性遗憾的角度重铸出最佳条件,从不变风险最小化(IRM)构建,找到一种表示预测指标能够与甲骨文竞争的代表,并在事后访问了持有的环境。结构化的扩展可以通过专门的域扰动自适应地突出了由于复杂环境而引起的变化。我们在多种应用上评估了我们的方法:分子性质预测,蛋白质同源性和稳定性预测,并表明RGM明显胜过以前的最新基准。
Many biochemical applications such as molecular property prediction require models to generalize beyond their training domains (environments). Moreover, natural environments in these tasks are structured, defined by complex descriptors such as molecular scaffolds or protein families. Therefore, most environments are either never seen during training, or contain only a single training example. To address these challenges, we propose a new regret minimization (RGM) algorithm and its extension for structured environments. RGM builds from invariant risk minimization (IRM) by recasting simultaneous optimality condition in terms of predictive regret, finding a representation that enables the predictor to compete against an oracle with hindsight access to held-out environments. The structured extension adaptively highlights variation due to complex environments via specialized domain perturbations. We evaluate our method on multiple applications: molecular property prediction, protein homology and stability prediction and show that RGM significantly outperforms previous state-of-the-art baselines.