global_economy |>
filter(Country == "Argentina") |>
autoplot(Exports)Exercise solutions: Section 8.8
fpp3 8.8, Ex5
Data set
global_economycontains the annual Exports from many countries. Select one country to analyse.
- Plot the Exports series and discuss the main features of the data.
There is a huge jump in Exports in 2002, due to the deregulation of the Argentinian peso. Since then, Exports (as a percentage of GDP) has gradually returned to 1990 levels.
- Use an ETS(A,N,N) model to forecast the series, and plot the forecasts.
etsANN <- global_economy |>
filter(Country == "Argentina") |>
model(ETS(Exports ~ error("A") + trend("N") + season("N")))
etsANN |>
forecast(h = 10) |>
autoplot(global_economy)
- Compute the RMSE values for the training data.
accuracy(etsANN) |> select(RMSE)# A tibble: 1 × 1
RMSE
<dbl>
1 2.78
- Compare the results to those from an ETS(A,A,N) model. (Remember that the trended model is using one more parameter than the simpler model.) Discuss the merits of the two forecasting methods for this data set.
fit <- global_economy |>
filter(Country == "Argentina") |>
model(
ses = ETS(Exports ~ error("A") + trend("N") + season("N")),
holt = ETS(Exports ~ error("A") + trend("A") + season("N"))
)
accuracy(fit)# A tibble: 2 × 11
Country .model .type ME RMSE MAE MPE MAPE MASE RMSSE ACF1
<fct> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Argentina ses Training 0.0762 2.78 1.62 -1.73 15.7 0.983 0.986 0.00902
2 Argentina holt Training 0.00795 2.78 1.64 -2.51 15.9 0.994 0.986 0.0271
There is very little difference in training RMSE between these models. So the extra parameter is not doing much.
- Compare the forecasts from both methods. Which do you think is best?
fit |>
forecast(h = 10) |>
autoplot(global_economy)- The forecasts are similar. In this case, the simpler model is preferred.
- Calculate a 95% prediction interval for the first forecast for each series, using the RMSE values and assuming normal errors. Compare your intervals with those produced using R.
- standard error. (from RMSE)
- mean (from forecast)
s <- accuracy(fit) |> pull(RMSE)
yhat <- forecast(fit, h = 1) |> pull(.mean)
# SES
yhat[1] + c(-1, 1) * qnorm(0.975) * s[1][1] 5.882074 16.764136
# Holt
yhat[2] + c(-1, 1) * qnorm(0.975) * s[2][1] 5.989515 16.872908
fit |>
forecast(h = 1) |>
mutate(PI = hilo(Exports, level = 95))# A fable: 2 x 6 [1Y]
# Key: Country, .model [2]
Country .model Year
<fct> <chr> <dbl>
1 Argentina ses 2018
2 Argentina holt 2018
# ℹ 3 more variables: Exports <dist>, .mean <dbl>, PI <hilo>
Using RMSE yields narrower prediction interval while using the values from
hilo()function gives wider prediction interval.Using RMSE has failed to take account of the degrees of freedom for each model. Compare the following
sse <- augment(fit) |>
as_tibble() |>
group_by(.model) |>
summarise(s = sum(.resid^2)) |>
pull(s)
n <- global_economy |>
filter(Country == "Argentina") |>
nrow()
# sse method= alpha, level=> 2
# holt linear = alpha, level, trend, b => 4
s <- sqrt(sse / (n - c(2, 4)))
# SES
yhat[1] + c(-1, 1) * qnorm(0.975) * s[1][1] 5.785088 16.861122
# Holt
yhat[2] + c(-1, 1) * qnorm(0.975) * s[2][1] 5.79226 17.07016
fpp3 8.8, Ex7
Find an ETS model for the Gas data from
aus_productionand forecast the next few years. Why is multiplicative seasonality necessary here? Experiment with making the trend damped. Does it improve the forecasts?
aus_production |> autoplot(Gas)- There is a huge increase in variance as the series increases in level. => That makes it necessary to use multiplicative seasonality.
fit <- aus_production |>
model(
hw = ETS(Gas ~ error("M") + trend("A") + season("M")),
hwdamped = ETS(Gas ~ error("M") + trend("Ad") + season("M")),
)
fit |> glance()# A tibble: 2 × 9
.model sigma2 log_lik AIC AICc BIC MSE AMSE MAE
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 hw 0.00324 -831. 1681. 1682. 1711. 21.1 32.2 0.0413
2 hwdamped 0.00329 -832. 1684. 1685. 1718. 21.1 32.0 0.0417
- The non-damped model seems to be doing slightly better here, probably because the trend is very strong over most of the historical data.
fit |>
select(hw) |>
gg_tsresiduals()fit |> tidy()# A tibble: 19 × 3
.model term estimate
<chr> <chr> <dbl>
1 hw alpha 0.653
2 hw beta 0.144
3 hw gamma 0.0978
4 hw l[0] 5.95
5 hw b[0] 0.0706
6 hw s[0] 0.931
7 hw s[-1] 1.18
8 hw s[-2] 1.07
9 hw s[-3] 0.816
10 hwdamped alpha 0.649
11 hwdamped beta 0.155
12 hwdamped gamma 0.0937
13 hwdamped phi 0.980
14 hwdamped l[0] 5.86
15 hwdamped b[0] 0.0994
16 hwdamped s[0] 0.928
17 hwdamped s[-1] 1.18
18 hwdamped s[-2] 1.08
19 hwdamped s[-3] 0.817
fit |>
augment() |>
filter(.model == "hw") |>
features(.innov, ljung_box, lag = 24)# A tibble: 1 × 3
.model lb_stat lb_pvalue
<chr> <dbl> <dbl>
1 hw 57.1 0.000161
- There is still some small correlations left in the residuals, showing the model has not fully captured the available information.
- There also appears to be some heteroskedasticity in the residuals with larger variance in the first half the series.
fit |>
forecast(h = 36) |>
filter(.model == "hw") |>
autoplot(aus_production)While the point forecasts look ok, the intervals are excessively wide.
fpp3 8.8, Ex11
For this exercise use the quarterly number of arrivals to Australia from New Zealand, 1981 Q1 – 2012 Q3, from data set
aus_arrivals.
- Make a time plot of your data and describe the main features of the series.
nzarrivals <- aus_arrivals |> filter(Origin == "NZ")
nzarrivals |> autoplot(Arrivals / 1e3) + labs(y = "Thousands of people")- The data has an upward trend.
- The data has a seasonal pattern which increases in size approximately proportionally to the average number of people who arrive per year. Therefore, the data has multiplicative seasonality.
- Create a training set that withholds the last two years of available data. Forecast the test set using an appropriate model for Holt-Winters’ multiplicative method.
nz_tr <- nzarrivals |>
slice(1:(n() - 8))
nz_tr |>
model(ETS(Arrivals ~ error("M") + trend("A") + season("M"))) |>
forecast(h = "2 years") |>
autoplot() +
autolayer(nzarrivals, Arrivals)
- Why is multiplicative seasonality necessary here?
- The multiplicative seasonality is important in this example because the seasonal pattern increases in size proportionally to the level of the series.
- The behaviour of the seasonal pattern will be captured and projected in a model with multiplicative seasonality.
- Forecast the two-year test set using each of the following methods:
- an ETS model;
- an additive ETS model applied to a log transformed series;
- a seasonal naïve method;
- an STL decomposition applied to the log transformed data followed by an ETS model applied to the seasonally adjusted (transformed) data.
fc <- nz_tr |>
model(
ets = ETS(Arrivals),
log_ets = ETS(log(Arrivals)),
snaive = SNAIVE(Arrivals),
stl = decomposition_model(STL(log(Arrivals)), ETS(season_adjust))
) |>
forecast(h = "2 years")
fc |>
autoplot(level = NULL) +
autolayer(filter(nzarrivals, year(Quarter) > 2000), Arrivals)fc |>
autoplot(level = NULL) +
autolayer(nzarrivals, Arrivals)
- Which method gives the best forecasts? Does it pass the residual tests?
fc |>
accuracy(nzarrivals)# A tibble: 4 × 11
.model Origin .type ME RMSE MAE MPE MAPE MASE RMSSE ACF1
<chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 ets NZ Test -3495. 14913. 11421. -0.964 3.78 0.768 0.771 -0.0260
2 log_ets NZ Test 2467. 13342. 11904. 1.03 4.03 0.800 0.689 -0.0786
3 snaive NZ Test 9709. 18051. 17156. 3.44 5.80 1.15 0.933 -0.239
4 stl NZ Test -12535. 22723. 16172. -4.02 5.23 1.09 1.17 0.109
- The best method is the ETS model on the logged data (based on RMSE), and it passes the residuals tests.
log_ets <- nz_tr |>
model(ETS(log(Arrivals)))
log_ets |> gg_tsresiduals()augment(log_ets) |>
features(.innov, ljung_box, lag = 12)# A tibble: 1 × 4
Origin .model lb_stat lb_pvalue
<chr> <chr> <dbl> <dbl>
1 NZ ETS(log(Arrivals)) 11.0 0.530
- Compare the same four methods using time series cross-validation instead of using a training and test set. Do you come to the same conclusions?
nz_cv <- nzarrivals |>
slice(1:(n() - 3)) |>
stretch_tsibble(.init = 36, .step = 3)
nz_cv |>
model(
ets = ETS(Arrivals),
log_ets = ETS(log(Arrivals)),
snaive = SNAIVE(Arrivals),
stl = decomposition_model(STL(log(Arrivals)), ETS(season_adjust))
) |>
forecast(h = 3) |>
accuracy(nzarrivals)# A tibble: 4 × 11
.model Origin .type ME RMSE MAE MPE MAPE MASE RMSSE ACF1
<chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 ets NZ Test 4627. 15327. 11799. 2.23 6.45 0.793 0.797 0.283
2 log_ets NZ Test 4388. 15047. 11566. 1.99 6.36 0.778 0.782 0.268
3 snaive NZ Test 8244. 18768. 14422. 3.83 7.76 0.970 0.976 0.566
4 stl NZ Test 4252. 15618. 11873. 2.04 6.25 0.798 0.812 0.244
- An initial fold size (
.init) of 36 has been selected to ensure that sufficient data is available to make reasonable forecasts. - A step size of 3 (and forecast horizon of 3) has been used to reduce the computation time.
- The ETS model on the log data still appears best (based on 3-step ahead forecast RMSE).
fpp3 8.8, Ex14
- Use
ETS()to select an appropriate model for the following series: total number of trips across Australia usingtourism, the closing prices for the four stocks ingafa_stock, and the lynx series inpelt. Does it always give good forecasts?
tourism
aus_trips <- tourism |>
summarise(Trips = sum(Trips))
aus_trips |>
model(ETS(Trips)) |>
report()Series: Trips
Model: ETS(A,A,A)
Smoothing parameters:
alpha = 0.4495675
beta = 0.04450178
gamma = 0.0001000075
Initial states:
l[0] b[0] s[0] s[-1] s[-2] s[-3]
21689.64 -58.46946 -125.8548 -816.3416 -324.5553 1266.752
sigma^2: 699901.4
AIC AICc BIC
1436.829 1439.400 1458.267
aus_trips |>
model(ETS(Trips)) |>
forecast() |>
autoplot(aus_trips)Forecasts appear reasonable.
GAFA stock
gafa_regular <- gafa_stock |>
group_by(Symbol) |>
mutate(trading_day = row_number()) |>
ungroup() |>
as_tsibble(index = trading_day, regular = TRUE)
gafa_stock |> autoplot(Close)gafa_regular |>
model(ETS(Close))# A mable: 4 x 2
# Key: Symbol [4]
Symbol `ETS(Close)`
<chr> <model>
1 AAPL <ETS(M,N,N)>
2 AMZN <ETS(M,N,N)>
3 FB <ETS(M,N,N)>
4 GOOG <ETS(M,N,N)>
gafa_regular |>
model(ETS(Close)) |>
forecast(h = 50) |>
autoplot(gafa_regular |> group_by_key() |> slice((n() - 100):n()))`mutate_if()` ignored the following grouping variables:
• Column `Symbol`
Forecasts look reasonable for an efficient market.
Pelt trading records
pelt |>
model(ETS(Lynx))# A mable: 1 x 1
`ETS(Lynx)`
<model>
1 <ETS(A,N,N)>
pelt |>
model(ETS(Lynx)) |>
forecast(h = 10) |>
autoplot(pelt)- Here the cyclic behaviour of the lynx data is completely lost.
- ETS models are not designed to handle cyclic data, so there is nothing that can be done to improve this.
- Find an example where it does not work well. Can you figure out why?
- ETS does not work well on cyclic data, as seen in the pelt dataset above.