论文标题
通过对抗性学习匹配神经形态事件和彩色图像
Matching Neuromorphic Events and Color Images via Adversarial Learning
论文作者
论文摘要
活动摄像头具有吸引人的属性:高动态范围,低潜伏期,低功耗和低内存使用情况,因此为传统的基于框架的摄像机提供了互补性。它只能捕获场景的动态,并能够捕获几乎“连续”的运动。但是,与反映场景整体外观的基于框架的摄像机不同,事件摄像机抛弃了对象的详细特征,例如纹理和颜色。为了呈现两种方式的优势,将事件摄像头和基于框架的摄像机组合在一起,以完成各种机器视觉任务。然后,神经形态事件和颜色图像之间的横模匹配起着至关重要的作用。在本文中,我们提出了基于事件的图像检索(EBIR)问题,以利用跨模式匹配任务。给定事件流将特定对象描述为查询,其目的是检索包含相同对象的颜色图像。这个问题是具有挑战性的,因为神经形态事件和颜色图像之间存在很大的方式差距。我们通过提出神经形态事件色彩图像特征学习(ECFL)来解决埃比尔问题。特别是,对抗性学习被用来将神经形态事件和彩色图像共同建模为一个常见的嵌入空间。我们还为社区N-UKBench和EC180数据集做出了贡献,以促进Ebir问题的发展。我们数据集上的广泛实验表明,所提出的方法在学习有效的模态不变表示方面优越,以链接两种不同的方式。
The event camera has appealing properties: high dynamic range, low latency, low power consumption and low memory usage, and thus provides complementariness to conventional frame-based cameras. It only captures the dynamics of a scene and is able to capture almost "continuous" motion. However, different from frame-based camera that reflects the whole appearance as scenes are, the event camera casts away the detailed characteristics of objects, such as texture and color. To take advantages of both modalities, the event camera and frame-based camera are combined together for various machine vision tasks. Then the cross-modal matching between neuromorphic events and color images plays a vital and essential role. In this paper, we propose the Event-Based Image Retrieval (EBIR) problem to exploit the cross-modal matching task. Given an event stream depicting a particular object as query, the aim is to retrieve color images containing the same object. This problem is challenging because there exists a large modality gap between neuromorphic events and color images. We address the EBIR problem by proposing neuromorphic Events-Color image Feature Learning (ECFL). Particularly, the adversarial learning is employed to jointly model neuromorphic events and color images into a common embedding space. We also contribute to the community N-UKbench and EC180 dataset to promote the development of EBIR problem. Extensive experiments on our datasets show that the proposed method is superior in learning effective modality-invariant representation to link two different modalities.