基于时间卷积注意的序列建模网络

论文标题

基于时间卷积注意的序列建模网络

Temporal Convolutional Attention-based Network For Sequence Modeling

论文作者

Hao, Hongyan, Wang, Yan, Xue, Siqiao, Xia, Yudi, Zhao, Jian, Shen, Furao

论文摘要

随着馈电模型的开发，序列建模的默认模型已逐渐发展为替换经常性网络。提出了许多基于卷积网络和注意力机制的强大进料模型，并显示出更大的处理序列建模任务的潜力。我们想知道的是，存在一种体系结构，不仅可以实现经常性网络的近似替代，而且还可以吸收馈送前向模型的优势。因此，我们提出了一种探索性架构，该探索性架构称为基于时间卷积注意的网络（TCAN），该网络结合了时间卷积网络和注意机制。 TCAN包括两个部分，一个是暂时的注意（TA），捕获了序列内的相关特征，另一部分是增强的残留（ER），它提取了浅层层的重要信息并将其转移到深层。我们将BPC/困惑的最先进结果提高到Word级PTB上的30.28，字符级PTB上的1.092和Wikitext-2上的9.20。

With the development of feed-forward models, the default model for sequence modeling has gradually evolved to replace recurrent networks. Many powerful feed-forward models based on convolutional networks and attention mechanism were proposed and show more potential to handle sequence modeling tasks. We wonder that is there an architecture that can not only achieve an approximate substitution of recurrent network, but also absorb the advantages of feed-forward models. So we propose an exploratory architecture referred to Temporal Convolutional Attention-based Network (TCAN) which combines temporal convolutional network and attention mechanism. TCAN includes two parts, one is Temporal Attention (TA) which captures relevant features inside the sequence, the other is Enhanced Residual (ER) which extracts shallow layer's important information and transfers to deep layers. We improve the state-of-the-art results of bpc/perplexity to 30.28 on word-level PTB, 1.092 on character-level PTB, and 9.20 on WikiText-2.

下载PDF全文

下载文献需遵守相关版权规定

论文标题