论文标题
使在线素描更快
Making Online Sketching Hashing Even Faster
论文作者
论文摘要
与数据有关的哈希方法证明了在各种机器学习应用程序中的良好性能,可以从原始数据中学习低维表示。但是,它们仍然遇到多个障碍:首先,大多数现有的哈希方法在批处理模式下进行培训,从而使培训流数据效率低下。其次,在大数据设置中,计算成本和内存消耗大大增加,这会使训练程序感到困惑。第三,缺乏标记的数据阻碍了模型性能的改善。为了解决这些困难,我们利用在线草图哈希(OSH),并提出更快的在线草图哈希(Frosh)算法,以更紧凑的形式通过独立的转换来绘制数据。我们提供理论上的理由,以确保我们提议的杂物消耗更少的时间,并在相同的OSH记忆成本下达到可比的草图精度。我们还将Frosh扩展到其分布式实现,即DFROSH,以进一步降低杂草的训练时间成本,同时得出草图精度的理论结合。最后,我们对合成数据集进行了广泛的实验,以证明Frosh和Dfrosh的吸引力。
Data-dependent hashing methods have demonstrated good performance in various machine learning applications to learn a low-dimensional representation from the original data. However, they still suffer from several obstacles: First, most of existing hashing methods are trained in a batch mode, yielding inefficiency for training streaming data. Second, the computational cost and the memory consumption increase extraordinarily in the big data setting, which perplexes the training procedure. Third, the lack of labeled data hinders the improvement of the model performance. To address these difficulties, we utilize online sketching hashing (OSH) and present a FasteR Online Sketching Hashing (FROSH) algorithm to sketch the data in a more compact form via an independent transformation. We provide theoretical justification to guarantee that our proposed FROSH consumes less time and achieves a comparable sketching precision under the same memory cost of OSH. We also extend FROSH to its distributed implementation, namely DFROSH, to further reduce the training time cost of FROSH while deriving the theoretical bound of the sketching precision. Finally, we conduct extensive experiments on both synthetic and real datasets to demonstrate the attractive merits of FROSH and DFROSH.