|>
aus_production autoplot(Electricity)
Exercise solutions: Section 9.11
fpp3 9.11, Ex11
Choose one of the following seasonal time series: the Australian production of electricity, cement, or gas (from
aus_production
).
- Do the data need transforming? If so, find a suitable transformation.
Yes, these need transforming.
<- aus_production |>
lambda features(Electricity, guerrero) |>
pull(lambda_guerrero)
|>
aus_production autoplot(box_cox(Electricity, lambda))
guerrero()
suggests using Box-Cox transformation with parameter \(\lambda=0.52\).
- Are the data stationary? If not, find an appropriate differencing which yields stationary data.
The trend and seasonality show that the data are not stationary.
|>
aus_production gg_tsdisplay(box_cox(Electricity, lambda) |> difference(4), plot_type = "partial")
|>
aus_production gg_tsdisplay(box_cox(Electricity, lambda) |> difference(4) |> difference(1), plot_type = "partial")
- It seems that we could have continued with only taking seasonal differences. You may try this option.
- We opt to take a first order difference as well.
- Identify a couple of ARIMA models that might be useful in describing the time series. Which of your models is the best according to their AIC values?
From the above graph, an AR(1) or an MA(1) with a seasonal MA(2) might work. So an ARIMA(1,1,0)(0,1,2) model for the transformed data.
<- aus_production |>
fit model(
manual = ARIMA(box_cox(Electricity, lambda) ~ 0 + pdq(1, 1, 0) + PDQ(0, 1, 2)),
auto = ARIMA(box_cox(Electricity, lambda))
)|>
fit select(auto) |>
report()
Series: Electricity
Model: ARIMA(1,1,4)(0,1,1)[4]
Transformation: box_cox(Electricity, lambda)
Coefficients:
ar1 ma1 ma2 ma3 ma4 sma1
-0.7030 0.2430 -0.4477 -0.1553 -0.2452 -0.5574
s.e. 0.1739 0.1943 0.0973 0.0931 0.1189 0.1087
sigma^2 estimated as 18.02: log likelihood=-609.08
AIC=1232.17 AICc=1232.71 BIC=1255.7
glance(fit)
# A tibble: 2 × 8
.model sigma2 log_lik AIC AICc BIC ar_roots ma_roots
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <list> <list>
1 manual 19.7 -620. 1247. 1248. 1261. <cpl [1]> <cpl [8]>
2 auto 18.0 -609. 1232. 1233. 1256. <cpl [1]> <cpl [8]>
Automatic model selection with ARIMA()
has also taken a first order difference, and so we can compare the AICc values. This is a challenging ARIMA model to select manually and the automatic model is clearly better.
- Estimate the parameters of your best model and do diagnostic testing on the residuals. Do the residuals resemble white noise? If not, try to find another ARIMA model which fits better.
|>
fit select(auto) |>
gg_tsresiduals()
# to find out the dof for Ljung Box test
|> select(auto) |> report() fit
Series: Electricity
Model: ARIMA(1,1,4)(0,1,1)[4]
Transformation: box_cox(Electricity, lambda)
Coefficients:
ar1 ma1 ma2 ma3 ma4 sma1
-0.7030 0.2430 -0.4477 -0.1553 -0.2452 -0.5574
s.e. 0.1739 0.1943 0.0973 0.0931 0.1189 0.1087
sigma^2 estimated as 18.02: log likelihood=-609.08
AIC=1232.17 AICc=1232.71 BIC=1255.7
tidy(fit)
# A tibble: 9 × 6
.model term estimate std.error statistic p.value
<chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 manual ar1 -0.346 0.0648 -5.34 2.33e- 7
2 manual sma1 -0.731 0.0876 -8.34 9.16e-15
3 manual sma2 0.0820 0.0858 0.956 3.40e- 1
4 auto ar1 -0.703 0.174 -4.04 7.40e- 5
5 auto ma1 0.243 0.194 1.25 2.13e- 1
6 auto ma2 -0.448 0.0973 -4.60 7.15e- 6
7 auto ma3 -0.155 0.0931 -1.67 9.68e- 2
8 auto ma4 -0.245 0.119 -2.06 4.04e- 2
9 auto sma1 -0.557 0.109 -5.13 6.52e- 7
|>
fit select(auto) |>
augment() |>
features(.innov, ljung_box, dof = 6, lag = 12)
# A tibble: 1 × 3
.model lb_stat lb_pvalue
<chr> <dbl> <dbl>
1 auto 8.45 0.207
Residuals look reasonable, they resemble white noise.
- Forecast the next 24 months of data using your preferred model.
|>
fit select(auto) |>
forecast(h = "2 years") |>
autoplot(aus_production)
- Compare the forecasts obtained using
ETS()
.
|>
aus_production model(ETS(Electricity)) |>
forecast(h = "2 years") |>
autoplot(aus_production)
|>
aus_production model(ETS(Electricity)) |>
forecast(h = "2 years") |>
autoplot(aus_production |> filter(year(Quarter) >= 2000)) +
autolayer(fit |> select(auto) |> forecast(h = "2 years"), colour = "red", alpha = 0.4)
Scale for fill_ramp is already present.
Adding another scale for fill_ramp, which will replace the existing scale.
- The point forecasts appear to be quite similar.
- The ETS forecasts have a wider forecast interval than the ARIMA forecasts.
fpp3 9.11, Ex12
For the same time series you used in the previous exercise, try using a non-seasonal model applied to the seasonally adjusted data obtained from STL. Compare the forecasts with those obtained in the previous exercise. Which do you think is the best approach?
<- aus_production |>
lambda features(Electricity, guerrero) |>
pull(lambda_guerrero)
<- decomposition_model(
stlarima STL(box_cox(Electricity, lambda)),
ARIMA(season_adjust)
)<- aus_production |>
fc model(
ets = ETS(Electricity),
arima = ARIMA(box_cox(Electricity, lambda)),
stlarima = stlarima
|>
) forecast(h = "2 years")
|> autoplot(aus_production |> filter(year(Quarter) > 2000), level=95) fc
The STL-ARIMA approach has higher values and narrower prediction intervals. It is hard to know which is best without comparing against a test set.
fpp3 9.11, Ex13
For the Australian tourism data (from
tourism
): a. Fit a suitable ARIMA model for all data. b. Produce forecasts of your fitted models. c. Check the forecasts for the “Snowy Mountains” and “Melbourne” regions. Do they look reasonable?
<- tourism |>
fit model(arima = ARIMA(Trips))
<- fit |> forecast(h="3 years")
fc |>
fc filter(Region == "Snowy Mountains") |>
autoplot(tourism)
|>
fc filter(Region == "Melbourne") |>
autoplot(tourism)
Both sets of forecasts appear to have captured the underlying trend and seasonality effectively.
fpp3 9.11, Ex15
- The last five values of the series are given below:
Year | 1931 | 1932 | 1933 | 1934 | 1935 |
---|---|---|---|---|---|
Number of hare pelts | 19520 | 82110 | 89760 | 81660 | 15760 |
The estimated parameters are \(c = 30993\), \(\phi_1 = 0.82\), \(\phi_2 = -0.29\), \(\phi_3 = -0.01\), and \(\phi_4 = -0.22\). Without using the
forecast
function, calculate forecasts for the next three years (1936–1939).
\[\begin{align*} \hat{y}_{T+1|T} & = 30993 + 0.82* 15760 -0.29* 81660 -0.01* 89760 -0.22* 82110 = 2051.57 \\ \hat{y}_{T+2|T} & = 30993 + 0.82* 2051.57 -0.29* 15760 -0.01* 81660 -0.22* 89760 = 8223.14 \\ \hat{y}_{T+3|T} & = 30993 + 0.82* 8223.14 -0.29* 2051.57 -0.01* 15760 -0.22* 81660 = 19387.96 \\ \end{align*}\]
- Now fit the model in R and obtain the forecasts using
forecast
. How are they different from yours? Why?
|>
pelt model(ARIMA(Hare ~ pdq(4, 0, 0))) |>
forecast(h=3)
# A fable: 3 x 4 [1Y]
# Key: .model [1]
.model Year Hare
<chr> <dbl> <dist>
1 ARIMA… 1936 N(2052, 5.9e+08)
2 ARIMA… 1937 N(8223, 9.8e+08)
3 ARIMA… 1938 N(19388, 1.1e+09)
# ℹ 1 more variable: .mean <dbl>
Any differences will be due to rounding errors.
fpp3 9.11, Ex16
- The last five values of the series are given below.
Year | 2013 | 2014 | 2015 | 2016 | 2017 |
---|---|---|---|---|---|
Population (millions) | 8.09 | 8.19 | 8.28 | 8.37 | 8.47 |
The estimated parameters are \(c = 0.0053\), \(\phi_1 = 1.64\), \(\phi_2 = -1.17\), and \(\phi_3 = 0.45\). Without using the
forecast
function, calculate forecasts for the next three years (2018–2020).
\[\begin{align*} \hat{y}_{T+1|T} & = 0.0053 + 8.47+ 1.64* (8.47 - 8.37) -1.17* (8.37 - 8.28) + 0.45* (8.28 - 8.19) = 8.56 \\ \hat{y}_{T+2|T} & = 0.0053 + 8.56+ 1.64* (8.56 - 8.47) -1.17* (8.47 - 8.37) + 0.45* (8.37 - 8.28) = 8.65 \\ \hat{y}_{T+3|T} & = 0.0053 + 8.65+ 1.64* (8.65 - 8.56) -1.17* (8.56 - 8.47) + 0.45* (8.47 - 8.37) = 8.73 \\ \end{align*}\]
- Now fit the model in R and obtain the forecasts from the same model. How are they different from yours? Why?
|>
global_economy filter(Country == "Switzerland") |>
mutate(Population = Population / 1e6) |>
model(ARIMA(Population ~ 1 + pdq(3, 1, 0))) |>
forecast(h=3)
# A fable: 3 x 5 [1Y]
# Key: Country, .model [1]
Country .model Year
<fct> <chr> <dbl>
1 Switzerland ARIMA(Population ~ 1 + pdq(3, 1, 0)) 2018
2 Switzerland ARIMA(Population ~ 1 + pdq(3, 1, 0)) 2019
3 Switzerland ARIMA(Population ~ 1 + pdq(3, 1, 0)) 2020
# ℹ 2 more variables: Population <dist>, .mean <dbl>
Any differences will be due to rounding errors.