Compare to ARIMA, LSTM works WAY FASTER.
With the same idea, forecasting the new n+1 time lags by the previous n lages values.
However, different as what we expected,
LSTM did not work as well as we thought. (But WHY?)
But we definitely cannot use ARIMA forever---
Let's see the advantages and weakness of these two models!
● Relatively High SMAPE scores
Ends up with 39.6649 mean SMAPE score.
● Works well for short-term
For short-run forecasts with high frequency data.
● A LOT FASTER
One model for all the pages (145k) at once and take just 20 mins for 20 epochs.
● Not so sensitive to non-stationary data.
● High Cost and SUPER SLOW
1 min for each page, needs 100 days for 145k pages.
● Unstable
Too many assumptions to satisfy before using it.
Warning if any of it violated.
● Relatively lower score due to one model for all?
● Start to forget what happened very long ago (limit is 400 days)
Filling in the missing values by fibonacci sequence, which assumes the visits gradually increase.
Try to fit in smoothing spline analysis of variance (SSANOVA).
Will need to be careful of overfitting.
LSTM works poorly since it easily forgets long-term trend.
Try to use attention to fix this problem.
Attention can bring useful information from a distant past to the current RNN cell.