Exercise solutions: Section 5.11

Author

Rob J Hyndman and George Athanasopoulos

fpp3 5.11, Ex 1

Produce forecasts for the following series using whichever of NAIVE(y), SNAIVE(y) or RW(y ~ drift()) is more appropriate in each case:

  • Australian Population (global_economy)
  • Bricks (aus_production)
  • NSW Lambs (aus_livestock)
  • Household wealth (hh_budget)
  • Australian takeaway food turnover (aus_retail)

Australian population

global_economy |>
  filter(Country == "Australia") |>
  autoplot(Population)

Data has trend and no seasonality. Random walk with drift model is appropriate.

global_economy |>
  filter(Country == "Australia") |>
  model(RW(Population ~ drift())) |>
  forecast(h = "10 years") |>
  autoplot(global_economy)

Australian clay brick production

aus_production |>
  filter(!is.na(Bricks)) |>
  autoplot(Bricks) +
  labs(title = "Clay brick production")

This data appears to have more seasonality than trend, so of the models available, seasonal naive is most appropriate.

aus_production |>
  filter(!is.na(Bricks)) |>
  model(SNAIVE(Bricks)) |>
  forecast(h = "5 years") |>
  autoplot(aus_production)

NSW Lambs

nsw_lambs <- aus_livestock |>
  filter(State == "New South Wales", Animal == "Lambs")
nsw_lambs |>
  autoplot(Count)

This data appears to have more seasonality than trend, so of the models available, seasonal naive is most appropriate.

nsw_lambs |>
  model(SNAIVE(Count)) |>
  forecast(h = "5 years") |>
  autoplot(nsw_lambs)

Household wealth

hh_budget |>
  autoplot(Wealth)

Annual data with trend upwards, so we can use a random walk with drift.

hh_budget |>
  model(RW(Wealth ~ drift())) |>
  forecast(h = "5 years") |>
  autoplot(hh_budget)

Australian takeaway food turnover

takeaway <- aus_retail |>
  filter(Industry == "Takeaway food services") |>
  summarise(Turnover = sum(Turnover))
takeaway |> autoplot(Turnover)

This data has strong seasonality and strong trend, so we will use a seasonal naive model with drift.

takeaway |>
  model(SNAIVE(Turnover ~ drift())) |>
  forecast(h = "5 years") |>
  autoplot(takeaway)

This is actually not one of the four benchmark methods discussed in the book, but is sometimes a useful benchmark when there is strong seasonality and strong trend.

The corresponding equation is \[ \hat{y}_{T+h|T} = y_{T+h-m(k+1)} + \frac{h}{T-m}\sum_{t=m+1}^T(y_t - y_{t-m}), \] where \(m=12\) and \(k\) is the integer part of \((h-1)/m\) (i.e., the number of complete years in the forecast period prior to time \(T+h\)).

fpp3 5.11, Ex 3

Apply a seasonal naïve method to the quarterly Australian beer production data from 1992. Check if the residuals look like white noise, and plot the forecasts. The following code will help.

# Extract data of interest
recent_production <- aus_production |>
  filter(year(Quarter) >= 1992)
# Define and estimate a model
fit <- recent_production |> model(SNAIVE(Beer))
# Look at the residuals
fit |> gg_tsresiduals()

  • The residuals are not centred around 0 (typically being slightly below it), this is due to the model failing to capture the negative trend in the data.
  • Peaks and troughs in residuals spaced roughly 4 observations apart are apparent leading to a negative spike at lag 4 in the ACF. So they do not resemble white noise. Lags 1 and 3 are also significant, however they are very close to the threshold and are of little concern.
  • The distribution of the residuals does not appear very normal, however it is probably close enough for the accuracy of our intervals (it being not centred on 0 is more concerning).
# Look at some forecasts
fit |>
  forecast() |>
  autoplot(recent_production)

The forecasts look reasonable, although the intervals may be a bit wide. This is likely due to the slight trend not captured by the model (which subsequently violates the assumptions imposed on the residuals).

fpp3 5.11, Ex 5

Produce forecasts for the 7 Victorian series in aus_livestock using SNAIVE(). Plot the resulting forecasts including the historical data. Is this a reasonable benchmark for these series?

aus_livestock |>
  filter(State == "Victoria") |>
  model(SNAIVE(Count)) |>
  forecast(h = "5 years") |>
  autoplot(aus_livestock)

  • Most point forecasts look reasonable from the seasonal naive method.
  • Some series are more seasonal than others, and for the series with very weak seasonality it may be better to consider using a naive or drift method.
  • The prediction intervals in some cases go below zero, so perhaps a log transformation would have been better for these series.

fpp3 5.11, Ex 11

We will use the bricks data from aus_production (Australian quarterly clay brick production 1956–2005) for this exercise.

  1. Use an STL decomposition to calculate the trend-cycle and seasonal indices. (Experiment with having fixed or changing seasonality.)
tidy_bricks <- aus_production |>
  filter(!is.na(Bricks))
tidy_bricks |>
  model(STL(Bricks)) |>
  components() |>
  autoplot()

Data is multiplicative, and so a transformation should be used.

dcmp <- tidy_bricks |>
  model(STL(log(Bricks))) |>
  components()
dcmp |>
  autoplot()

Seasonality varies slightly.

dcmp <- tidy_bricks |>
  model(stl = STL(log(Bricks) ~ season(window = "periodic"))) |>
  components()
dcmp |> autoplot()

The seasonality looks fairly stable, so I’ve used a periodic season (window). The decomposition still performs well when the seasonal component is fixed. The remainder term does not appear to contain a substantial amount of seasonality.

  1. Compute and plot the seasonally adjusted data.
dcmp |>
  as_tsibble() |>
  autoplot(season_adjust)

  1. Use a naïve method to produce forecasts of the seasonally adjusted data.
fit <- dcmp |>
  select(-.model) |>
  model(naive = NAIVE(season_adjust)) |>
  forecast(h = "5 years")
dcmp |>
  as_tsibble() |>
  autoplot(season_adjust) + autolayer(fit)

  1. Use decomposition_model() to reseasonalise the results, giving forecasts for the original data.
fit <- tidy_bricks |>
  model(stl_mdl = decomposition_model(STL(log(Bricks)), NAIVE(season_adjust)))
fit |>
  forecast(h = "5 years") |>
  autoplot(tidy_bricks)

  1. Do the residuals look uncorrelated?
fit |> gg_tsresiduals()

The residuals do not appear uncorrelated as there are several lags of the ACF which exceed the significance threshold.

  1. Repeat with a robust STL decomposition. Does it make much difference?
fit_robust <- tidy_bricks |>
  model(stl_mdl = decomposition_model(STL(log(Bricks)), NAIVE(season_adjust)))

fit_robust |> gg_tsresiduals()

The residuals appear slightly less auto-correlated, however there is still significant auto-correlation at lag 8.

  1. Compare forecasts from decomposition_model() with those from SNAIVE(), using a test set comprising the last 2 years of data. Which is better?
tidy_bricks_train <- tidy_bricks |>
  slice(1:(n() - 8))
fit <- tidy_bricks_train |>
  model(
    stl_mdl = decomposition_model(STL(log(Bricks)), NAIVE(season_adjust)),
    snaive = SNAIVE(Bricks)
  )

fc <- fit |>
  forecast(h = "2 years")
fc |>
  autoplot(tidy_bricks, level = NULL)

The decomposition forecasts appear to more closely follow the actual future data.

fc |>
  accuracy(tidy_bricks)
# A tibble: 2 × 10
  .model  .type    ME  RMSE   MAE     MPE  MAPE  MASE RMSSE    ACF1
  <chr>   <chr> <dbl> <dbl> <dbl>   <dbl> <dbl> <dbl> <dbl>   <dbl>
1 snaive  Test  2.75   20    18.2  0.395   4.52 0.504 0.407 -0.0503
2 stl_mdl Test  0.368  18.1  15.1 -0.0679  3.76 0.418 0.368  0.115 

The STL decomposition forecasts are more accurate than the seasonal naive forecasts across all accuracy measures.