论文标题
基于变压器的差异注意融合模型预测时间序列
A Differential Attention Fusion Model Based on Transformer for Time Series Forecasting
论文作者
论文摘要
时间序列预测广泛用于设备生命周期预测,天气预报,交通流量预测和其他领域的领域。最近,由于其强大的并行训练能力,一些学者试图将变压器应用于时间序列预测。但是,现有的变压器方法对在预测中起决定性作用的小时段没有足够的注意,这使其对影响时间序列趋势的小变化不敏感,而且很难有效地学习连续的时间依赖性特征。为了解决这个问题,我们提出了一个基于变压器的差异注意融合模型,该模型设计了差分层,邻居注意力,滑动融合机制以及基于经典变压器体系结构的残留层。具体而言,相邻时间点的差异是通过差异和邻居注意力提取和聚焦的。滑动融合机制融合了每个时间点的各种特征,因此数据可以参与编码和解码而不会丢失重要信息。包括卷积和LSTM在内的残留层进一步了解了时间点之间的依赖性,并使我们的模型能够进行更深入的训练。在三个数据集上进行的大量实验表明,我们方法产生的预测结果与最新的预测相当。
Time series forecasting is widely used in the fields of equipment life cycle forecasting, weather forecasting, traffic flow forecasting, and other fields. Recently, some scholars have tried to apply Transformer to time series forecasting because of its powerful parallel training ability. However, the existing Transformer methods do not pay enough attention to the small time segments that play a decisive role in prediction, making it insensitive to small changes that affect the trend of time series, and it is difficult to effectively learn continuous time-dependent features. To solve this problem, we propose a differential attention fusion model based on Transformer, which designs the differential layer, neighbor attention, sliding fusion mechanism, and residual layer on the basis of classical Transformer architecture. Specifically, the differences of adjacent time points are extracted and focused by difference and neighbor attention. The sliding fusion mechanism fuses various features of each time point so that the data can participate in encoding and decoding without losing important information. The residual layer including convolution and LSTM further learns the dependence between time points and enables our model to carry out deeper training. A large number of experiments on three datasets show that the prediction results produced by our method are favorably comparable to the state-of-the-art.