论文标题
深度视听相关学习的最新进展和挑战
Recent Advances and Challenges in Deep Audio-Visual Correlation Learning
论文作者
论文摘要
视听相关性学习旨在捕获基本的对应关系并了解音频和视频之间的自然现象。随着深度学习的快速发展,对这个新兴的研究问题的关注越来越多。在过去的几年中,已经提出了各种方法和数据集用于视听相关性学习,这激发了我们进行全面的调查。该调查文件侧重于用于学习音频和视频之间相关性的最新模型(SOTA)模型,但也讨论了AI多媒体中应用的一些定义和范式。此外,我们研究了一些经常用于优化视听相关学习模型的目标功能,并讨论在优化过程中如何利用视听数据。最重要的是,我们提供了SOTA Audio-Visual相关性学习的最新进展的广泛比较和汇总,并讨论了未来的研究方向。
Audio-visual correlation learning aims to capture essential correspondences and understand natural phenomena between audio and video. With the rapid growth of deep learning, an increasing amount of attention has been paid to this emerging research issue. Through the past few years, various methods and datasets have been proposed for audio-visual correlation learning, which motivate us to conclude a comprehensive survey. This survey paper focuses on state-of-the-art (SOTA) models used to learn correlations between audio and video, but also discusses some tasks of definition and paradigm applied in AI multimedia. In addition, we investigate some objective functions frequently used for optimizing audio-visual correlation learning models and discuss how audio-visual data is exploited in the optimization process. Most importantly, we provide an extensive comparison and summarization of the recent progress of SOTA audio-visual correlation learning and discuss future research directions.