aus_production |>
autoplot(Electricity)Exercise solutions: Section 9.11
fpp3 9.11, Ex11
Choose one of the following seasonal time series: the Australian production of electricity, cement, or gas (from
aus_production).
- Do the data need transforming? If so, find a suitable transformation.
Yes, these need transforming.
lambda <- aus_production |>
features(Electricity, guerrero) |>
pull(lambda_guerrero)
aus_production |>
autoplot(box_cox(Electricity, lambda))guerrero() suggests using Box-Cox transformation with parameter \(\lambda=0.52\).
- Are the data stationary? If not, find an appropriate differencing which yields stationary data.
The trend and seasonality show that the data are not stationary.
aus_production |>
gg_tsdisplay(box_cox(Electricity, lambda) |> difference(4), plot_type = "partial")aus_production |>
gg_tsdisplay(box_cox(Electricity, lambda) |> difference(4) |> difference(1), plot_type = "partial")- It seems that we could have continued with only taking seasonal differences. You may try this option.
- We opt to take a first order difference as well.
- Identify a couple of ARIMA models that might be useful in describing the time series. Which of your models is the best according to their AIC values?
From the above graph, an AR(1) or an MA(1) with a seasonal MA(2) might work. So an ARIMA(1,1,0)(0,1,2) model for the transformed data.
fit <- aus_production |>
model(
manual = ARIMA(box_cox(Electricity, lambda) ~ 0 + pdq(1, 1, 0) + PDQ(0, 1, 2)),
auto = ARIMA(box_cox(Electricity, lambda))
)
fit |>
select(auto) |>
report()Series: Electricity
Model: ARIMA(1,1,4)(0,1,1)[4]
Transformation: box_cox(Electricity, lambda)
Coefficients:
ar1 ma1 ma2 ma3 ma4 sma1
-0.7030 0.2430 -0.4477 -0.1553 -0.2452 -0.5574
s.e. 0.1739 0.1943 0.0973 0.0931 0.1189 0.1087
sigma^2 estimated as 18.02: log likelihood=-609.08
AIC=1232.17 AICc=1232.71 BIC=1255.7
glance(fit)# A tibble: 2 × 8
.model sigma2 log_lik AIC AICc BIC ar_roots ma_roots
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <list> <list>
1 manual 19.7 -620. 1247. 1248. 1261. <cpl [1]> <cpl [8]>
2 auto 18.0 -609. 1232. 1233. 1256. <cpl [1]> <cpl [8]>
Automatic model selection with ARIMA() has also taken a first order difference, and so we can compare the AICc values. This is a challenging ARIMA model to select manually and the automatic model is clearly better.
- Estimate the parameters of your best model and do diagnostic testing on the residuals. Do the residuals resemble white noise? If not, try to find another ARIMA model which fits better.
fit |>
select(auto) |>
gg_tsresiduals()# to find out the dof for Ljung Box test
fit |> select(auto) |> report()Series: Electricity
Model: ARIMA(1,1,4)(0,1,1)[4]
Transformation: box_cox(Electricity, lambda)
Coefficients:
ar1 ma1 ma2 ma3 ma4 sma1
-0.7030 0.2430 -0.4477 -0.1553 -0.2452 -0.5574
s.e. 0.1739 0.1943 0.0973 0.0931 0.1189 0.1087
sigma^2 estimated as 18.02: log likelihood=-609.08
AIC=1232.17 AICc=1232.71 BIC=1255.7
tidy(fit)# A tibble: 9 × 6
.model term estimate std.error statistic p.value
<chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 manual ar1 -0.346 0.0648 -5.34 2.33e- 7
2 manual sma1 -0.731 0.0876 -8.34 9.16e-15
3 manual sma2 0.0820 0.0858 0.956 3.40e- 1
4 auto ar1 -0.703 0.174 -4.04 7.40e- 5
5 auto ma1 0.243 0.194 1.25 2.13e- 1
6 auto ma2 -0.448 0.0973 -4.60 7.15e- 6
7 auto ma3 -0.155 0.0931 -1.67 9.68e- 2
8 auto ma4 -0.245 0.119 -2.06 4.04e- 2
9 auto sma1 -0.557 0.109 -5.13 6.52e- 7
fit |>
select(auto) |>
augment() |>
features(.innov, ljung_box, dof = 6, lag = 12)# A tibble: 1 × 3
.model lb_stat lb_pvalue
<chr> <dbl> <dbl>
1 auto 8.45 0.207
Residuals look reasonable, they resemble white noise.
- Forecast the next 24 months of data using your preferred model.
fit |>
select(auto) |>
forecast(h = "2 years") |>
autoplot(aus_production)
- Compare the forecasts obtained using
ETS().
aus_production |>
model(ETS(Electricity)) |>
forecast(h = "2 years") |>
autoplot(aus_production)aus_production |>
model(ETS(Electricity)) |>
forecast(h = "2 years") |>
autoplot(aus_production |> filter(year(Quarter) >= 2000)) +
autolayer(fit |> select(auto) |> forecast(h = "2 years"), colour = "red", alpha = 0.4)Scale for fill_ramp is already present.
Adding another scale for fill_ramp, which will replace the existing scale.
- The point forecasts appear to be quite similar.
- The ETS forecasts have a wider forecast interval than the ARIMA forecasts.
fpp3 9.11, Ex12
For the same time series you used in the previous exercise, try using a non-seasonal model applied to the seasonally adjusted data obtained from STL. Compare the forecasts with those obtained in the previous exercise. Which do you think is the best approach?
lambda <- aus_production |>
features(Electricity, guerrero) |>
pull(lambda_guerrero)
stlarima <- decomposition_model(
STL(box_cox(Electricity, lambda)),
ARIMA(season_adjust)
)
fc <- aus_production |>
model(
ets = ETS(Electricity),
arima = ARIMA(box_cox(Electricity, lambda)),
stlarima = stlarima
) |>
forecast(h = "2 years")
fc |> autoplot(aus_production |> filter(year(Quarter) > 2000), level=95)The STL-ARIMA approach has higher values and narrower prediction intervals. It is hard to know which is best without comparing against a test set.
fpp3 9.11, Ex13
For the Australian tourism data (from
tourism): a. Fit a suitable ARIMA model for all data. b. Produce forecasts of your fitted models. c. Check the forecasts for the “Snowy Mountains” and “Melbourne” regions. Do they look reasonable?
fit <- tourism |>
model(arima = ARIMA(Trips))
fc <- fit |> forecast(h="3 years")
fc |>
filter(Region == "Snowy Mountains") |>
autoplot(tourism)fc |>
filter(Region == "Melbourne") |>
autoplot(tourism)Both sets of forecasts appear to have captured the underlying trend and seasonality effectively.
fpp3 9.11, Ex15
- The last five values of the series are given below:
| Year | 1931 | 1932 | 1933 | 1934 | 1935 |
|---|---|---|---|---|---|
| Number of hare pelts | 19520 | 82110 | 89760 | 81660 | 15760 |
The estimated parameters are \(c = 30993\), \(\phi_1 = 0.82\), \(\phi_2 = -0.29\), \(\phi_3 = -0.01\), and \(\phi_4 = -0.22\). Without using the
forecastfunction, calculate forecasts for the next three years (1936–1939).
\[\begin{align*} \hat{y}_{T+1|T} & = 30993 + 0.82* 15760 -0.29* 81660 -0.01* 89760 -0.22* 82110 = 2051.57 \\ \hat{y}_{T+2|T} & = 30993 + 0.82* 2051.57 -0.29* 15760 -0.01* 81660 -0.22* 89760 = 8223.14 \\ \hat{y}_{T+3|T} & = 30993 + 0.82* 8223.14 -0.29* 2051.57 -0.01* 15760 -0.22* 81660 = 19387.96 \\ \end{align*}\]
- Now fit the model in R and obtain the forecasts using
forecast. How are they different from yours? Why?
pelt |>
model(ARIMA(Hare ~ pdq(4, 0, 0))) |>
forecast(h=3)# A fable: 3 x 4 [1Y]
# Key: .model [1]
.model Year Hare
<chr> <dbl> <dist>
1 ARIMA… 1936 N(2052, 5.9e+08)
2 ARIMA… 1937 N(8223, 9.8e+08)
3 ARIMA… 1938 N(19388, 1.1e+09)
# ℹ 1 more variable: .mean <dbl>
Any differences will be due to rounding errors.
fpp3 9.11, Ex16
- The last five values of the series are given below.
| Year | 2013 | 2014 | 2015 | 2016 | 2017 |
|---|---|---|---|---|---|
| Population (millions) | 8.09 | 8.19 | 8.28 | 8.37 | 8.47 |
The estimated parameters are \(c = 0.0053\), \(\phi_1 = 1.64\), \(\phi_2 = -1.17\), and \(\phi_3 = 0.45\). Without using the
forecastfunction, calculate forecasts for the next three years (2018–2020).
\[\begin{align*} \hat{y}_{T+1|T} & = 0.0053 + 8.47+ 1.64* (8.47 - 8.37) -1.17* (8.37 - 8.28) + 0.45* (8.28 - 8.19) = 8.56 \\ \hat{y}_{T+2|T} & = 0.0053 + 8.56+ 1.64* (8.56 - 8.47) -1.17* (8.47 - 8.37) + 0.45* (8.37 - 8.28) = 8.65 \\ \hat{y}_{T+3|T} & = 0.0053 + 8.65+ 1.64* (8.65 - 8.56) -1.17* (8.56 - 8.47) + 0.45* (8.47 - 8.37) = 8.73 \\ \end{align*}\]
- Now fit the model in R and obtain the forecasts from the same model. How are they different from yours? Why?
global_economy |>
filter(Country == "Switzerland") |>
mutate(Population = Population / 1e6) |>
model(ARIMA(Population ~ 1 + pdq(3, 1, 0))) |>
forecast(h=3)# A fable: 3 x 5 [1Y]
# Key: Country, .model [1]
Country .model Year
<fct> <chr> <dbl>
1 Switzerland ARIMA(Population ~ 1 + pdq(3, 1, 0)) 2018
2 Switzerland ARIMA(Population ~ 1 + pdq(3, 1, 0)) 2019
3 Switzerland ARIMA(Population ~ 1 + pdq(3, 1, 0)) 2020
# ℹ 2 more variables: Population <dist>, .mean <dbl>
Any differences will be due to rounding errors.