Exercise solutions: Section 9.11

Author

Rob J Hyndman and George Athanasopoulos

fpp3 9.11, Ex11

Choose one of the following seasonal time series: the Australian production of electricity, cement, or gas (from aus_production).

  1. Do the data need transforming? If so, find a suitable transformation.
aus_production |>
  autoplot(Electricity)

Yes, these need transforming.

lambda <- aus_production |>
  features(Electricity, guerrero) |>
  pull(lambda_guerrero)
aus_production |>
  autoplot(box_cox(Electricity, lambda))

guerrero() suggests using Box-Cox transformation with parameter \(\lambda=0.52\).

  1. Are the data stationary? If not, find an appropriate differencing which yields stationary data.

The trend and seasonality show that the data are not stationary.

aus_production |>
  gg_tsdisplay(box_cox(Electricity, lambda) |> difference(4), plot_type = "partial")

aus_production |>
  gg_tsdisplay(box_cox(Electricity, lambda) |> difference(4) |> difference(1), plot_type = "partial")

  • It seems that we could have continued with only taking seasonal differences. You may try this option.
  • We opt to take a first order difference as well.
  1. Identify a couple of ARIMA models that might be useful in describing the time series. Which of your models is the best according to their AIC values?

From the above graph, an AR(1) or an MA(1) with a seasonal MA(2) might work. So an ARIMA(1,1,0)(0,1,2) model for the transformed data.

fit <- aus_production |>
  model(
    manual = ARIMA(box_cox(Electricity, lambda) ~ 0 + pdq(1, 1, 0) + PDQ(0, 1, 2)),
    auto = ARIMA(box_cox(Electricity, lambda))
  )
fit |>
  select(auto) |>
  report()
Series: Electricity 
Model: ARIMA(1,1,4)(0,1,1)[4] 
Transformation: box_cox(Electricity, lambda) 

Coefficients:
          ar1     ma1      ma2      ma3      ma4     sma1
      -0.7030  0.2430  -0.4477  -0.1553  -0.2452  -0.5574
s.e.   0.1739  0.1943   0.0973   0.0931   0.1189   0.1087

sigma^2 estimated as 18.02:  log likelihood=-609.08
AIC=1232.17   AICc=1232.71   BIC=1255.7
glance(fit)
# A tibble: 2 × 8
  .model sigma2 log_lik   AIC  AICc   BIC ar_roots  ma_roots 
  <chr>   <dbl>   <dbl> <dbl> <dbl> <dbl> <list>    <list>   
1 manual   19.7   -620. 1247. 1248. 1261. <cpl [1]> <cpl [8]>
2 auto     18.0   -609. 1232. 1233. 1256. <cpl [1]> <cpl [8]>

Automatic model selection with ARIMA() has also taken a first order difference, and so we can compare the AICc values. This is a challenging ARIMA model to select manually and the automatic model is clearly better.

  1. Estimate the parameters of your best model and do diagnostic testing on the residuals. Do the residuals resemble white noise? If not, try to find another ARIMA model which fits better.
fit |>
  select(auto) |>
  gg_tsresiduals()

# to find out the dof for Ljung Box test
fit |> select(auto) |> report()
Series: Electricity 
Model: ARIMA(1,1,4)(0,1,1)[4] 
Transformation: box_cox(Electricity, lambda) 

Coefficients:
          ar1     ma1      ma2      ma3      ma4     sma1
      -0.7030  0.2430  -0.4477  -0.1553  -0.2452  -0.5574
s.e.   0.1739  0.1943   0.0973   0.0931   0.1189   0.1087

sigma^2 estimated as 18.02:  log likelihood=-609.08
AIC=1232.17   AICc=1232.71   BIC=1255.7
tidy(fit)
# A tibble: 9 × 6
  .model term  estimate std.error statistic  p.value
  <chr>  <chr>    <dbl>     <dbl>     <dbl>    <dbl>
1 manual ar1    -0.346     0.0648    -5.34  2.33e- 7
2 manual sma1   -0.731     0.0876    -8.34  9.16e-15
3 manual sma2    0.0820    0.0858     0.956 3.40e- 1
4 auto   ar1    -0.703     0.174     -4.04  7.40e- 5
5 auto   ma1     0.243     0.194      1.25  2.13e- 1
6 auto   ma2    -0.448     0.0973    -4.60  7.15e- 6
7 auto   ma3    -0.155     0.0931    -1.67  9.68e- 2
8 auto   ma4    -0.245     0.119     -2.06  4.04e- 2
9 auto   sma1   -0.557     0.109     -5.13  6.52e- 7
fit |>
  select(auto) |>
  augment() |>
  features(.innov, ljung_box, dof = 6, lag = 12)
# A tibble: 1 × 3
  .model lb_stat lb_pvalue
  <chr>    <dbl>     <dbl>
1 auto      8.45     0.207

Residuals look reasonable, they resemble white noise.

  1. Forecast the next 24 months of data using your preferred model.
fit |>
  select(auto) |>
  forecast(h = "2 years") |>
  autoplot(aus_production)

  1. Compare the forecasts obtained using ETS().
aus_production |>
  model(ETS(Electricity)) |>
  forecast(h = "2 years") |>
  autoplot(aus_production)

aus_production |>
  model(ETS(Electricity)) |>
  forecast(h = "2 years") |>
  autoplot(aus_production |> filter(year(Quarter) >= 2000)) +
  autolayer(fit |> select(auto) |> forecast(h = "2 years"), colour = "red", alpha = 0.4)
Scale for fill_ramp is already present.
Adding another scale for fill_ramp, which will replace the existing scale.

  • The point forecasts appear to be quite similar.
  • The ETS forecasts have a wider forecast interval than the ARIMA forecasts.

fpp3 9.11, Ex12

For the same time series you used in the previous exercise, try using a non-seasonal model applied to the seasonally adjusted data obtained from STL. Compare the forecasts with those obtained in the previous exercise. Which do you think is the best approach?

lambda <- aus_production |>
  features(Electricity, guerrero) |>
  pull(lambda_guerrero)
stlarima <- decomposition_model(
  STL(box_cox(Electricity, lambda)),
  ARIMA(season_adjust)
)
fc <- aus_production |>
  model(
    ets = ETS(Electricity),
    arima = ARIMA(box_cox(Electricity, lambda)),
    stlarima = stlarima
  ) |>
  forecast(h = "2 years")
fc |> autoplot(aus_production |> filter(year(Quarter) > 2000), level=95)

The STL-ARIMA approach has higher values and narrower prediction intervals. It is hard to know which is best without comparing against a test set.

fpp3 9.11, Ex13

For the Australian tourism data (from tourism): a. Fit a suitable ARIMA model for all data. b. Produce forecasts of your fitted models. c. Check the forecasts for the “Snowy Mountains” and “Melbourne” regions. Do they look reasonable?

fit <- tourism |>
  model(arima = ARIMA(Trips))
fc <- fit |> forecast(h="3 years")
fc |>
  filter(Region == "Snowy Mountains") |>
  autoplot(tourism)

fc |>
  filter(Region == "Melbourne") |>
  autoplot(tourism)

Both sets of forecasts appear to have captured the underlying trend and seasonality effectively.

fpp3 9.11, Ex15

  1. The last five values of the series are given below:
Year 1931 1932 1933 1934 1935
Number of hare pelts 19520 82110 89760 81660 15760

The estimated parameters are \(c = 30993\), \(\phi_1 = 0.82\), \(\phi_2 = -0.29\), \(\phi_3 = -0.01\), and \(\phi_4 = -0.22\). Without using the forecast function, calculate forecasts for the next three years (1936–1939).

\[\begin{align*} \hat{y}_{T+1|T} & = 30993 + 0.82* 15760 -0.29* 81660 -0.01* 89760 -0.22* 82110 = 2051.57 \\ \hat{y}_{T+2|T} & = 30993 + 0.82* 2051.57 -0.29* 15760 -0.01* 81660 -0.22* 89760 = 8223.14 \\ \hat{y}_{T+3|T} & = 30993 + 0.82* 8223.14 -0.29* 2051.57 -0.01* 15760 -0.22* 81660 = 19387.96 \\ \end{align*}\]

  1. Now fit the model in R and obtain the forecasts using forecast. How are they different from yours? Why?
pelt |>
  model(ARIMA(Hare ~ pdq(4, 0, 0))) |>
  forecast(h=3)
# A fable: 3 x 4 [1Y]
# Key:     .model [1]
  .model  Year              Hare
  <chr>  <dbl>            <dist>
1 ARIMA…  1936  N(2052, 5.9e+08)
2 ARIMA…  1937  N(8223, 9.8e+08)
3 ARIMA…  1938 N(19388, 1.1e+09)
# ℹ 1 more variable: .mean <dbl>

Any differences will be due to rounding errors.

fpp3 9.11, Ex16

  1. The last five values of the series are given below.
Year 2013 2014 2015 2016 2017
Population (millions) 8.09 8.19 8.28 8.37 8.47

The estimated parameters are \(c = 0.0053\), \(\phi_1 = 1.64\), \(\phi_2 = -1.17\), and \(\phi_3 = 0.45\). Without using the forecast function, calculate forecasts for the next three years (2018–2020).

\[\begin{align*} \hat{y}_{T+1|T} & = 0.0053 + 8.47+ 1.64* (8.47 - 8.37) -1.17* (8.37 - 8.28) + 0.45* (8.28 - 8.19) = 8.56 \\ \hat{y}_{T+2|T} & = 0.0053 + 8.56+ 1.64* (8.56 - 8.47) -1.17* (8.47 - 8.37) + 0.45* (8.37 - 8.28) = 8.65 \\ \hat{y}_{T+3|T} & = 0.0053 + 8.65+ 1.64* (8.65 - 8.56) -1.17* (8.56 - 8.47) + 0.45* (8.47 - 8.37) = 8.73 \\ \end{align*}\]

  1. Now fit the model in R and obtain the forecasts from the same model. How are they different from yours? Why?
global_economy |>
  filter(Country == "Switzerland") |>
  mutate(Population = Population / 1e6) |>
  model(ARIMA(Population ~ 1 + pdq(3, 1, 0))) |>
  forecast(h=3)
# A fable: 3 x 5 [1Y]
# Key:     Country, .model [1]
  Country     .model                                Year
  <fct>       <chr>                                <dbl>
1 Switzerland ARIMA(Population ~ 1 + pdq(3, 1, 0))  2018
2 Switzerland ARIMA(Population ~ 1 + pdq(3, 1, 0))  2019
3 Switzerland ARIMA(Population ~ 1 + pdq(3, 1, 0))  2020
# ℹ 2 more variables: Population <dist>, .mean <dbl>

Any differences will be due to rounding errors.