论文标题

厨师的随机表:非三角合格的随机特征

Chefs' Random Tables: Non-Trigonometric Random Features

论文作者

Likhosherstov, Valerii, Choromanski, Krzysztof, Dubey, Avinava, Liu, Frederick, Sarlos, Tamas, Weller, Adrian

论文摘要

我们介绍了厨师的随机表(CRT),这是一类新的非三角体随机特征(RFS),以近似高斯和软马克斯内核。 CRT是标准随机厨房水槽(RKS)方法的替代方法,该方法固有地依赖三角图。我们介绍了RFS呈阳性的CRT的变体,这是最近低级变压器应用程序的关键要求。通过利用易于计算的统计数据,可以进一步降低差异。据我们所知,CRT的一种实例化是最佳的正随机特征(OPRF),它是具有正面和有界RF的无偏软磁内核估计的第一种RF方法,导致尾巴呈指数小的尾巴和比其对应物的较低的方差。如我们所示,OPRF中应用的正交随机功能可为任何维度$ d $提供额外的差异(不仅是渐近的大于$ d $,对于RKS而言)。我们测试了从非参数分类到文本,语音和图像数据的训练变压器的许多任务,以获得低级别文本变压器的最新结果,同时提供线性空间和时间复杂性。

We introduce chefs' random tables (CRTs), a new class of non-trigonometric random features (RFs) to approximate Gaussian and softmax kernels. CRTs are an alternative to standard random kitchen sink (RKS) methods, which inherently rely on the trigonometric maps. We present variants of CRTs where RFs are positive, a key requirement for applications in recent low-rank Transformers. Further variance reduction is possible by leveraging statistics which are simple to compute. One instantiation of CRTs, the optimal positive random features (OPRFs), is to our knowledge the first RF method for unbiased softmax kernel estimation with positive and bounded RFs, resulting in exponentially small tails and much lower variance than its counterparts. As we show, orthogonal random features applied in OPRFs provide additional variance reduction for any dimensionality $d$ (not only asymptotically for sufficiently large $d$, as for RKS). We test CRTs on many tasks ranging from non-parametric classification to training Transformers for text, speech and image data, obtaining new state-of-the-art results for low-rank text Transformers, while providing linear space and time complexity.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源