通过多透明的不确定性正则化与文本反馈的组成图像检索

论文标题

通过多透明的不确定性正则化与文本反馈的组成图像检索

Composed Image Retrieval with Text Feedback via Multi-grained Uncertainty Regularization

论文作者

Chen, Yiyang, Zheng, Zhedong, Ji, Wei, Qu, Leigang, Chua, Tat-Seng

论文摘要

我们研究了使用文本反馈的组成图像检索。用户通过从粗糙的反馈到细粒度的反馈来逐渐寻找感兴趣的目标。但是，现有的方法仅着眼于后者，即通过在训练过程中利用正面和负面对，即细粒度的搜索。这种基于对的范式仅考虑一对特定点之间的一对一距离，这与一到一对的粗粒检索过程不符，并损害了召回率。为了填补这一空白，我们引入了一种统一的学习方法，通过考虑多透明的不确定性来同时建模粗粒和细粒度的检索。提出的方法的基础的关键思想是将细粒和粗粒的检索分别与大小波动相匹配。具体而言，我们的方法包含两个模块：不确定性建模和不确定性正则化。（1）建模建模的不确定性通过在特征空间中引入相同分布的波动来模拟多透明查询。（2）基于不确定性建模，我们进一步引入不确定性正则化以根据波动范围调整匹配目标。与现有方法相比，提出的策略明确防止模型在早期阶段推出潜在的候选者，从而提高了召回率。在三个公共数据集（即FashionIQ，Fashion200K和鞋子）上，提议的方法分别在强大的基线上获得了 +4.03％， +3.38％和 +2.40％的回忆@50准确性。

We investigate composed image retrieval with text feedback. Users gradually look for the target of interest by moving from coarse to fine-grained feedback. However, existing methods merely focus on the latter, i.e., fine-grained search, by harnessing positive and negative pairs during training. This pair-based paradigm only considers the one-to-one distance between a pair of specific points, which is not aligned with the one-to-many coarse-grained retrieval process and compromises the recall rate. In an attempt to fill this gap, we introduce a unified learning approach to simultaneously modeling the coarse- and fine-grained retrieval by considering the multi-grained uncertainty. The key idea underpinning the proposed method is to integrate fine- and coarse-grained retrieval as matching data points with small and large fluctuations, respectively. Specifically, our method contains two modules: uncertainty modeling and uncertainty regularization. (1) The uncertainty modeling simulates the multi-grained queries by introducing identically distributed fluctuations in the feature space. (2) Based on the uncertainty modeling, we further introduce uncertainty regularization to adapt the matching objective according to the fluctuation range. Compared with existing methods, the proposed strategy explicitly prevents the model from pushing away potential candidates in the early stage, and thus improves the recall rate. On the three public datasets, i.e., FashionIQ, Fashion200k, and Shoes, the proposed method has achieved +4.03%, +3.38%, and +2.40% Recall@50 accuracy over a strong baseline, respectively.

下载PDF全文

下载文献需遵守相关版权规定

论文标题