The dataset is from the Kaggle competition
Web Traffic Time Series Forecasting from 07/13/17 to 11/15/17.
Each row represents a visit series of a page from 07/31/15
to 12/31/16 (for training set 1) and to 09/01/17 (for training set 2).
name_project_access_agent
● Name: page name
● Project: website language
Deutsch(de), English(en), Spanish(es), French(fr), Japanese(ja),
Russian(ru), Chinese(zh), mediawiki, commons.wikimedia
● Accessibility: Type of access
all-access, desktop, mobile
● Agent: Type of agent
all-agents, spider
The competition aims to forecast two months visits of 145k pages in Wikipedia.
The competition ends on 09/11/17 and the expected forecasting dates are from 09/13/17 to 11/13/17.
In this website, in order to better visualize the results, we are mostly using training set 1 in this stage
and compare the true visits data in training set 2.
(That is to say, we are using the dataset from 07/31/15 to 12/31/16 to forecast the visits from 01/01/17 to 03/01/17.)
After several try, we will move on to training set 2 using our finalized model.
Common statistical model to well-forecast the time series data.
Many assumptions behind the model might be challenging.
Common Deep Learning method used on time series data. Based on RNN, but fix the problem of long-term dependency.
Each model are discussed in their own sections, we will go deeper to show how they work.
Go ahead to explore each page for different models in the navigation bar, or follow the bottons
under each section for full story!
Mean Absolute percent Error (MAPE) is a measure of prediction accuracy of a forecasting method using the difference between ture value and predicted value divided by the actual value, then taking the absolute value, summing them up, and divided by the number of the amount of the values.
where A is the actual value and F is the forecast value. However, MAPE is simple and convincing but it cannot be used if there are zero values because there would be a division by zero, and that's why this competition is asking for the SMAPE value.
Symmetric Mean Absolute Percent Error (SMAPE) is an alternative method to MAPE when there are zero or near-zero demand for items. Since the low volume items have infinitely high error rates that skew the overall error rate, SMAPE self-limits to an error rate of 200% and reduces the influence of low volume items.
According to the leaderboard on Kaggle, the mean SMAPE value between 35.48065 and 40.33832 are ranked within top 20%, and value lower than 41.94324 are ranked within top 50% in this competition. Our results will be considered based on this leaderboard.