Time-Series Transfer Learning with LSTM

This paper proposes applying transfer learning to time series modeling using an LSTM autoencoder with attention. The authors show that a model trained on a large preprocessed time series dataset can extract generalized features that improve classification, disaggregation, and forecasting on other datasets. Specifically: - An LSTM autoencoder with attention is trained on a large preprocessed time series dataset to learn generalized features. - These features improve classification accuracy on a benchmark dataset compared to traditional time series features. - The model, fine-tuned on individual time series, improves performance on disaggregation and forecasting of other datasets compared to not using transfer learning. - The authors aim to make the pretrained time series feature extraction model publicly available, as

Uploaded by

M.Qasim Mahmood

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

69 views4 pages

Time-Series Transfer Learning with LSTM

Uploaded by

M.Qasim Mahmood

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Workshop track - ICLR 2018

A PPLIED TIME - SERIES T RANSFER LEARNING

Nikolay Laptev, Jiafan Yu & Ram Rajagopal
Department of Electrical Engineering
Stanford University
Stanford, CA, USA
{nlaptev,jfy,ramr}@stanford.edu

A BSTRACT
Reliable and accurate time-series modeling is critical in many fields including
energy, finance, and manufacturing. Many time-series tasks, however, suffer from
a limited amount of clean training data resulting in poor forecasting, classification
or clustering performance. Recently, convolutional neural networks (CNNs) have
shown outstanding image classification performance even on tasks with small-
scale training sets. The performance can be attributed to transfer learning through
ability of CNNs to learn rich mid-level image representations. For time-series,
however, no prior work exists on general transfer learning. In this short paper,
motivated by recent success of transfer learning in image-related tasks, we are the
first to show that using an LSTM auto-encoder with attention trained on a large-
scale timeseries dataset with pre-processing we can effectively transfer time-series
features across diverse domains.

Accurate time-series modeling is critical for load forecasting, financial market analysis, anomaly
detection, optimal resource allocation, budget planning, and other related tasks. While time-series
modeling has been investigated for a long time, the problem is still challenging, especially in appli-
cations with limited or noisy history (e.g., holidays, sporting events) where practitioners are forced
to use adhoc machine learning approaches achieving poor performance (Wu & Olson, 2015).
Transfer learning (Pan & Yang, 2010) can address this problem. In transfer learning, we first train
a base network on a base dataset and task, and then we repurpose the learned features, or transfer
them, to a second target network to be trained on a target dataset and task. Recent findings of Bengio
(2012) show preliminary results of successfully using transfer learning on images. Motivated by
these and many other (Yosinski et al., 2014; Huang et al., 2013; Karpathy et al., 2014; Oquab et al.,
2014b) results we investigate if transfer learning also applies to time series.
Transfer learning involves the concepts of a task and of a domain. A domain D consists of a marginal
probability distribution P (X) over the feature space X = {x1 , ..., xn }. Thus, given a domain
D = {X , P (X )}, a task T is composed of a label space Y and a conditional probability distribution
P (Y |X) that is usually learned from training examples consisting of pairs xi ∈ X and yi ∈ Y.
Given a source domain DS and a source task TS as well as a target domain DT and a target task TT ,
transfer learning aims to learn the target conditional probability distribution P (YT |XT ) in DT from
the information learned from DS and TT . In this workshop paper we apply transfer learning to a
time-series domain and apply it to cases where XS 6= XT and P (YS |XS ) 6= P (YT |XT ) (e.g., target
domains with limited training data, different tasks and different time-series classes).
We propose to learn the nonlinear mapping in time series forecasting with the help of attention-
based LSTM auto-encoder. We first pre-process the training data by detrending, deseasoning and
window-normalizing. Using an attention mechanism aims to identify the part of the time-series that
the model should focus on, therefore allowing the model to switch its focus based on the current
input and what the model produced so far. Using time-series pre-processing and LSTM autoencoder
with attention allow us to successfully learn and transfer time-series features to a wide verity of
tasks.
In this workshop paper, we show that using a model based on a long short term memory (LSTM)
(Hochreiter & Schmidhuber, 1997) auto-encoder with attention (Vaswani et al., 2017) trained on
a large-scale time-series dataset with pre-processing we can effectively transfer features across di-
verse domains. The use of an LSTM-based model is motivated by its continued success in modeling

1
Workshop track - ICLR 2018

sequences1 . The use of the attention mechanism is used to improve embedding efficiency through
attending to specific parts of the input. During training, we found that pre-processing the time-
series by detrending, deseasoning and normalizing is critical to deal with diverse target domains.
We systematically demonstrate that the LSTM with auto-encoder with attention has potential in fea-
ture extraction, classification, disaggregation, and forecasting, by testing on multiple cross-domain
datasets.
Abstract Feature Extraction from LSTM for Time Series Classification We first show that the
proposed attention-based auto-encoder on LSTM can successfully extract generalized features from
time series. Such features are used for unlabelled time series classification and we show substan-
tial improvements on classification accuracy over only using traditional features/statistics of time
series. We use the standard UCR time-series classification dataset (Chen et al., 2015). The accu-
racy of the classifier is shown in Figure 1a. The y-axis is the accuracy of the classifier. Each bar
shows the accuracy for a certain category. The classifier is trained using either only standard time-
series features (e.g., variance, seasonality, trend), or the abstract features extracted by the proposed
attention-based auto-encoder. Besides the difference of features, all other settings (machine learning
model, training/testing size, etc.) are identical for two classifiers. The top of the blue bar is the
accuracy for classifier with standard features, and the top of the red bar is the accuracy for classifier
with auto-encoder extracted features. Hence, the length of the red bar demonstrates a great boost in
performance relative to the baseline method of using standard time-series features.
Transfer Learning for Time Series Disaggregation We show transferability of the learned time-
series features for disaggregation. In particular, we aim to estimate individual appliance usage given
an aggregate power consumption. We compare the disaggregation performance with and without
attention-based auto-encoder framework, by using Pecan Street dataset (Pecan Street Inc., 2017),
which contains hourly measurement of power consumption of homes and individual appliances from
345 homes with each having complete record for at least one appliance, mainly located in Austin,
Texas, in 2016. We use the pre-trained model on a large, generalized dataset, then fine-tune the
last prediction layer for each individual time series on the disaggregation task. The individual time
series is different from any time series in the generalized dataset. The disaggregation performance is
shown in Figure 1b. The layout of Figure 1b is similar with the layout of Figure 1a. It is shown that
using transfer learning, the accuracy for disaggregation of different appliance types is substantially
increased.
Transfer Learning for Time Series Forecasting We also demonstrate the transferability of time
series forecasting models. To train the forecasting model with transfer learning, we also first use
the pre-trained LSTM based forecasting model on a large dataset. Then, for any individual time
series (not from the large dataset), we fine tune the fully connected layers of the LSTM model based
on the individual data. We use a large-scale electricity consumption dataset containing 116,000
anonymous residential electricity loads at hourly granularity from Pacific Gas and Electric Company
(PG&E dataset). The 116,000 time series are from 13 different climate zones and are with large
diversity (Kwac et al., 2014). We also test the transferability of time-series forecasting on standard
M3 dataset. The result is shown in Figure 1c and Figure 1d. The y-axis is the symmetric mean
absolute percentage error(SMAPE). We compare the performances for both in-sample training data,
and out-of-sample test data. The constant improvements of using transfer learning (red bars) over
not using transfer learning (blue bars) show competitive performance even on short time-series such
as those in the M3 dataset.
In summary, we present an attention-based auto-encoder architecture for time-series feature extrac-
tion, trained on a large scale pre-processed time-series dataset with plans to be made publically
available as a pre-trained model for time-series applications similar to the image equivalent that ex-
ists today (Oquab et al., 2014a). We also show the transferability of the learned time-series features
to classification, disaggregation, and forecasting tasks. We are currently focusing on providing this
work as a service for practitioners to use as an online tool for time-series feature generation or as an
offline pre-trained model to be used as a prior for time-series machine learning tasks.

1
Laptev et al. (2017) show that an LSTM forecasting model is able to outperform classical time series
methods in cases with long, interdependent time series.

2
Workshop track - ICLR 2018

(a)

(b)

Figure 1: (a) Performance comparison of classification/clustering task on the UCR dataset by di-
rectly using the learned features in traditional machine learning models. (b) Transfer learning ap-
plied to the disaggregation tasks. (c),(d) Forecasting task evaluation by leveraging transfer learning
on new datasets.

3
Workshop track - ICLR 2018

R EFERENCES
Yoshua Bengio. Deep learning of representations for unsupervised and transfer learning. ICML
Workshop on Unsupervised and Transfer Learning, pp. 17–36, 2012.
Yanping Chen, Eamonn Keogh, Bing Hu, Nurjahan Begum, Anthony Bagnall, Abdullah Mueen,
and Gustavo Batista. The ucr time series classification archive, July 2015. www.cs.ucr.edu/
˜eamonn/time_series_data/.
Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory. Neural Computing, 9(1):
41–55, 1997.
Jui-Ting Huang, Jinyu Li, Dong Yu, Li Deng, and Yifan Gong. Cross-language knowledge transfer
using multilingual deep neural network with shared hidden layers. IEEE International Conference
on Acoustics, Speech and Signal Processing, pp. 7304–7308, 2013.
Andrej Karpathy, George Toderici, Sanketh Shetty, Thomas Leung, Rahul Sukthankar, and Li Fei-
Fei. Large-scale video classification with convolutional neural networks. IEEE conference on
Computer Vision and Pattern Recognition, pp. 1725–1732, 2014.
Jungsuk Kwac, June Flora, and Ram Rajagopal. Household energy consumption segmentation using
hourly data. IEEE Transactions on Smart Grid, 5(1):420–430, 2014.
Nikolay Laptev, Jason Yosinski, Li Erran Li, and Slawek Smyl. Time-series extreme event forecast-
ing with neural networks at Uber. International Conference on Machine Learning, 2017.
M. Oquab, L. Bottou, I. Laptev, and J. Sivic. Learning and transferring mid-level image representa-
tions using convolutional neural networks. In CVPR, 2014a.
Maxime Oquab, Leon Bottou, Ivan Laptev, and Josef Sivic. Learning and transferring mid-level
image representations using convolutional neural networks. IEEE Conference on Computer Vision
and Pattern Recognition, pp. 1717–1724, 2014b.
Sinno Jialin Pan and Qiang Yang. A survey on transfer learning. IEEE Transactions on knowledge
and data engineering, 22(10):1345–1359, 2010.
Pecan Street Inc. Dataport from pecan street, 2017. URL https://dataport.cloud/.
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez,
Lukasz Kaiser, and Illia Polosukhin. Attention is all you need. 2017. URL https://arxiv.
org/pdf/1706.03762.pdf.
Desheng Dash Wu and David L Olson. Financial risk forecast using machine learning and sentiment
analysis, 2015.
Jason Yosinski, Jeff Clune, Yoshua Bengio, and Hod Lipson. How transferable are features in deep
neural networks? Advances in Neural Information Processing Systems, pp. 3320–3328, 2014.

A Multi-View Multi-Task Learning Framework For Multi-Variate Time Series Forecasting
No ratings yet
A Multi-View Multi-Task Learning Framework For Multi-Variate Time Series Forecasting
16 pages
Time Series
No ratings yet
Time Series
29 pages
Time Series Forecasting With Transformer Models and Application To Asset Management
No ratings yet
Time Series Forecasting With Transformer Models and Application To Asset Management
44 pages
Deep Learning Models For Time Series Forecasting A Review
No ratings yet
Deep Learning Models For Time Series Forecasting A Review
22 pages
Transfer Learning With Time Series Data A Systematic Mapping Study
No ratings yet
Transfer Learning With Time Series Data A Systematic Mapping Study
24 pages
Transformers Architectures For Time Series Forecasting
No ratings yet
Transformers Architectures For Time Series Forecasting
109 pages
Deep Learning For Time Series Forecasting: A Survey
No ratings yet
Deep Learning For Time Series Forecasting: A Survey
34 pages
Digital Oil Field (DOF)
No ratings yet
Digital Oil Field (DOF)
2 pages
Java Past Paper
No ratings yet
Java Past Paper
3 pages
The Diagnostic Process: Learning Objectives Key Terms
No ratings yet
The Diagnostic Process: Learning Objectives Key Terms
21 pages
Dell Latitude E5400 and E5500 Spec Sheet
100% (1)
Dell Latitude E5400 and E5500 Spec Sheet
2 pages
Time Series Forecasting With Deep Learning: A Survey: Research
No ratings yet
Time Series Forecasting With Deep Learning: A Survey: Research
13 pages
Operations On Array
No ratings yet
Operations On Array
9 pages
Financial Time Series Forecasting Using CNN and Transformer
No ratings yet
Financial Time Series Forecasting Using CNN and Transformer
4 pages
Parallel Multivariate Deep Learning Models For Time-Series Prediction: A Comparative Analysis in Asian Stock Markets
No ratings yet
Parallel Multivariate Deep Learning Models For Time-Series Prediction: A Comparative Analysis in Asian Stock Markets
12 pages
BDCC 08 00048
No ratings yet
BDCC 08 00048
14 pages
Machine Learning Lab Guide
No ratings yet
Machine Learning Lab Guide
69 pages
Multi-Channels Deep Convolution Neural Network For Early Classification of Multivariate Time Series
No ratings yet
Multi-Channels Deep Convolution Neural Network For Early Classification of Multivariate Time Series
11 pages
AOPA - GPS Technology
100% (1)
AOPA - GPS Technology
16 pages
Cheng, Chen, Liu - Unknown - Advancing Time Series Classification With Multimodal Language Modeling
No ratings yet
Cheng, Chen, Liu - Unknown - Advancing Time Series Classification With Multimodal Language Modeling
15 pages
Business Process Simulation Guide
No ratings yet
Business Process Simulation Guide
24 pages
MOSFET Basics for Engineering Students
No ratings yet
MOSFET Basics for Engineering Students
46 pages
Long-Term Forecasting With TiDE Time-Series Dense Encoder
No ratings yet
Long-Term Forecasting With TiDE Time-Series Dense Encoder
21 pages
Ikigai The Japanese Secret To PDF
0% (1)
Ikigai The Japanese Secret To PDF
1 page
Transformer Forecasting Review
No ratings yet
Transformer Forecasting Review
30 pages
Multivariate Time Series Forecasting Final 3rd Sem
No ratings yet
Multivariate Time Series Forecasting Final 3rd Sem
22 pages
F A: I - T S F S F: Ilter Then Ttend Mproving Attention Based IME Eries Orecasting With Pectral Iltering
No ratings yet
F A: I - T S F S F: Ilter Then Ttend Mproving Attention Based IME Eries Orecasting With Pectral Iltering
17 pages
Installation Manual - MPS125-100-R003
No ratings yet
Installation Manual - MPS125-100-R003
29 pages
1 s2.0 S2666651025000099 Main
No ratings yet
1 s2.0 S2666651025000099 Main
15 pages
A Transformer-Based Framework For Multivariate Time Series Representation Learning
No ratings yet
A Transformer-Based Framework For Multivariate Time Series Representation Learning
20 pages
A Survey On Time-Series Pre-Trained Models
No ratings yet
A Survey On Time-Series Pre-Trained Models
20 pages
Report
No ratings yet
Report
11 pages
762 32fre
No ratings yet
762 32fre
10 pages
1 s2.0 S0952197625009650 Main
No ratings yet
1 s2.0 S0952197625009650 Main
13 pages
IoT Data Analytics Using Deep Learning
No ratings yet
IoT Data Analytics Using Deep Learning
12 pages
Are Self Attention Effective For Time Series Forecasting
No ratings yet
Are Self Attention Effective For Time Series Forecasting
23 pages
1 s2.0 S0950705125010160 Main
No ratings yet
1 s2.0 S0950705125010160 Main
11 pages
Enhancing The Locality and Breaking The Memory Bottleneck of Transformer On Time Series Forecasting Paper
No ratings yet
Enhancing The Locality and Breaking The Memory Bottleneck of Transformer On Time Series Forecasting Paper
11 pages
Seriesnet:A Generative Time Series Forecasting Model: Zhipeng Shen, Yuanming Zhang, Jiawei Lu, Jun Xu, Gang Xiao
No ratings yet
Seriesnet:A Generative Time Series Forecasting Model: Zhipeng Shen, Yuanming Zhang, Jiawei Lu, Jun Xu, Gang Xiao
8 pages
1.shiyang Li - Enhance Locality and Break The Memory Bottleneck
No ratings yet
1.shiyang Li - Enhance Locality and Break The Memory Bottleneck
14 pages
XLSTMTime - Long-Term Time Series Forecasting With XLSTM
No ratings yet
XLSTMTime - Long-Term Time Series Forecasting With XLSTM
13 pages
A Comparison Between Machine and Deep Learning Models On High Stationarity Data
No ratings yet
A Comparison Between Machine and Deep Learning Models On High Stationarity Data
11 pages
Time Machine
No ratings yet
Time Machine
10 pages
A Transformer That Tends To Mine Metaphorical-Level Information
No ratings yet
A Transformer That Tends To Mine Metaphorical-Level Information
16 pages
Book 7
No ratings yet
Book 7
35 pages
Leveraging Hybrid Deep Learning Models For Enhanced Multivariate Time Series Forecasting
No ratings yet
Leveraging Hybrid Deep Learning Models For Enhanced Multivariate Time Series Forecasting
25 pages
A Survey On Time-Series Pre-Trained Models
No ratings yet
A Survey On Time-Series Pre-Trained Models
20 pages
1 s2.0 S0925231220300606 Main
No ratings yet
1 s2.0 S0925231220300606 Main
11 pages
Bryan Lim
No ratings yet
Bryan Lim
145 pages
SSRN 4165241
No ratings yet
SSRN 4165241
28 pages
2233 A Transformer Based Framework
No ratings yet
2233 A Transformer Based Framework
19 pages
T T - A: A T: Ransformers in IME Series Nalysis Utorial
No ratings yet
T T - A: A T: Ransformers in IME Series Nalysis Utorial
29 pages
Deep SSL for Time Series Analysis
No ratings yet
Deep SSL for Time Series Analysis
7 pages
Crossformer - Transformer Utilizing Cross-Dimension Dependency For Multivariate Time Series Forecasting
No ratings yet
Crossformer - Transformer Utilizing Cross-Dimension Dependency For Multivariate Time Series Forecasting
21 pages
Card: C A R B T - T S F: Hannel Ligned Obust Lend Rans Former For IME Eries Orecasting
No ratings yet
Card: C A R B T - T S F: Hannel Ligned Obust Lend Rans Former For IME Eries Orecasting
39 pages
MixMamba Time Series Modeling With Adaptive Expertise
No ratings yet
MixMamba Time Series Modeling With Adaptive Expertise
13 pages
Multimedia Unit 4
No ratings yet
Multimedia Unit 4
16 pages
CH 10
No ratings yet
CH 10
41 pages
Approaches and Applications of Early Classification
No ratings yet
Approaches and Applications of Early Classification
15 pages
TiDE: Fast Long-Term Forecasting Model
No ratings yet
TiDE: Fast Long-Term Forecasting Model
18 pages
Jwt-Auth: Pacote: Tymon/Jwt-Auth Github: Documentação: 1. Instalar O Pacote
No ratings yet
Jwt-Auth: Pacote: Tymon/Jwt-Auth Github: Documentação: 1. Instalar O Pacote
3 pages
632 Itransformer Inverted Tran
No ratings yet
632 Itransformer Inverted Tran
25 pages
Deep Learning For Time Series Classification A Rev
No ratings yet
Deep Learning For Time Series Classification A Rev
48 pages
Improving Position Encoding of Transformers For Multivariate Time Series Classification
No ratings yet
Improving Position Encoding of Transformers For Multivariate Time Series Classification
28 pages
Timemachine: A Time Series Is Worth 4 Mambas For Long-Term Forecasting
No ratings yet
Timemachine: A Time Series Is Worth 4 Mambas For Long-Term Forecasting
10 pages
Multivariate Lstm-Fcns For Time Series Classification: A B A, A
No ratings yet
Multivariate Lstm-Fcns For Time Series Classification: A B A, A
18 pages
Solar Power Forecasting Using Deep Learning Techniques-3
No ratings yet
Solar Power Forecasting Using Deep Learning Techniques-3
7 pages
Dynamic Graph Neural ODE Forecasting
No ratings yet
Dynamic Graph Neural ODE Forecasting
14 pages
Altair PBS Eclipse Integraton 2012
No ratings yet
Altair PBS Eclipse Integraton 2012
13 pages
John Locke Essays
100% (2)
John Locke Essays
5 pages
Time Series 10.1007@s10618 019 00619 1
No ratings yet
Time Series 10.1007@s10618 019 00619 1
47 pages
T: I T A E T S F: I Ransformer Nverted Ransformers RE Ffective For IME Eries Orecasting
No ratings yet
T: I T A E T S F: I Ransformer Nverted Ransformers RE Ffective For IME Eries Orecasting
19 pages
ADATIME: A Benchmarking Suite For Domain Adaptation On Time Series Data
No ratings yet
ADATIME: A Benchmarking Suite For Domain Adaptation On Time Series Data
18 pages
Keywords and Identifiers in C
No ratings yet
Keywords and Identifiers in C
3 pages
Siprotec 7sa511 Distance Protection Relay: Function Overview
No ratings yet
Siprotec 7sa511 Distance Protection Relay: Function Overview
3 pages
EV Charger Specification
No ratings yet
EV Charger Specification
9 pages
Workflow Attributes - HTML Body
No ratings yet
Workflow Attributes - HTML Body
12 pages
Computerised Accounting 2019
No ratings yet
Computerised Accounting 2019
2 pages
DSP LAB Manual - ECE - KNCET
No ratings yet
DSP LAB Manual - ECE - KNCET
60 pages
Marketing Information Systems
No ratings yet
Marketing Information Systems
7 pages
Fast Newton-Raphson Power Flow Analysis Based On Sparse Techniques and Parallel Processing
No ratings yet
Fast Newton-Raphson Power Flow Analysis Based On Sparse Techniques and Parallel Processing
11 pages
KIDNAPPERS AND ROBBERS THREAT-ALERT INTELLIGENT SYSTEM 2 Unical Conference
No ratings yet
KIDNAPPERS AND ROBBERS THREAT-ALERT INTELLIGENT SYSTEM 2 Unical Conference
13 pages
Curl Multi Perform
No ratings yet
Curl Multi Perform
1 page
Assigning A Sound File To An Instance. Assigning A Keyboard Key To An Instance. Assigning An Image File To An Instance. All of The Above. ( )
No ratings yet
Assigning A Sound File To An Instance. Assigning A Keyboard Key To An Instance. Assigning An Image File To An Instance. All of The Above. ( )
4 pages
CVT Chain1
No ratings yet
CVT Chain1
6 pages
SPLA Licensing Best Practices
No ratings yet
SPLA Licensing Best Practices
1 page
Bus Naming On Xilinx Schematics PDF
No ratings yet
Bus Naming On Xilinx Schematics PDF
3 pages
Egbuziem CelestinaCV
No ratings yet
Egbuziem CelestinaCV
3 pages

Time-Series Transfer Learning with LSTM

Uploaded by

Time-Series Transfer Learning with LSTM

Uploaded by

Workshop track - ICLR 2018

A PPLIED TIME - SERIES T RANSFER LEARNING

You might also like