Exercise solutions: Section 5.11

Author

Rob J Hyndman and George Athanasopoulos

fpp3 5.11, Ex 1

Produce forecasts for the following series using whichever of NAIVE(y), SNAIVE(y) or RW(y ~ drift()) is more appropriate in each case:

Australian Population (global_economy)

Bricks (aus_production)

NSW Lambs (aus_livestock)

Household wealth (hh_budget)

Australian takeaway food turnover (aus_retail)

Australian population

global_economy |>
  filter(Country == "Australia") |>
  autoplot(Population)

Data has trend and no seasonality. Random walk with drift model is appropriate.

global_economy |>
  filter(Country == "Australia") |>
  model(RW(Population ~ drift())) |>
  forecast(h = "10 years") |>
  autoplot(global_economy)

Australian clay brick production

aus_production |>
  filter(!is.na(Bricks)) |>
  autoplot(Bricks) +
  labs(title = "Clay brick production")

This data appears to have more seasonality than trend, so of the models available, seasonal naive is most appropriate.

aus_production |>
  filter(!is.na(Bricks)) |>
  model(SNAIVE(Bricks)) |>
  forecast(h = "5 years") |>
  autoplot(aus_production)

NSW Lambs

nsw_lambs <- aus_livestock |>
  filter(State == "New South Wales", Animal == "Lambs")
nsw_lambs |>
  autoplot(Count)

This data appears to have more seasonality than trend, so of the models available, seasonal naive is most appropriate.

nsw_lambs |>
  model(SNAIVE(Count)) |>
  forecast(h = "5 years") |>
  autoplot(nsw_lambs)

Household wealth

hh_budget |>
  autoplot(Wealth)

Annual data with trend upwards, so we can use a random walk with drift.

hh_budget |>
  model(RW(Wealth ~ drift())) |>
  forecast(h = "5 years") |>
  autoplot(hh_budget)

Australian takeaway food turnover

takeaway <- aus_retail |>
  filter(Industry == "Takeaway food services") |>
  summarise(Turnover = sum(Turnover))
takeaway |> autoplot(Turnover)

This data has strong seasonality and strong trend, so we will use a seasonal naive model with drift.

takeaway |>
  model(SNAIVE(Turnover ~ drift())) |>
  forecast(h = "5 years") |>
  autoplot(takeaway)

This is actually not one of the four benchmark methods discussed in the book, but is sometimes a useful benchmark when there is strong seasonality and strong trend.

The corresponding equation is \[ \hat{y}_{T+h|T} = y_{T+h-m(k+1)} + \frac{h}{T-m}\sum_{t=m+1}^T(y_t - y_{t-m}), \] where \(m=12\) and \(k\) is the integer part of \((h-1)/m\) (i.e., the number of complete years in the forecast period prior to time \(T+h\)).

fpp3 5.11, Ex 3

Apply a seasonal naïve method to the quarterly Australian beer production data from 1992. Check if the residuals look like white noise, and plot the forecasts. The following code will help.

# Extract data of interest
recent_production <- aus_production |>
  filter(year(Quarter) >= 1992)
# Define and estimate a model
fit <- recent_production |> model(SNAIVE(Beer))
# Look at the residuals
fit |> gg_tsresiduals()

The residuals are not centred around 0 (typically being slightly below it), this is due to the model failing to capture the negative trend in the data.
Peaks and troughs in residuals spaced roughly 4 observations apart are apparent leading to a negative spike at lag 4 in the ACF. So they do not resemble white noise. Lags 1 and 3 are also significant, however they are very close to the threshold and are of little concern.
The distribution of the residuals does not appear very normal, however it is probably close enough for the accuracy of our intervals (it being not centred on 0 is more concerning).

# Look at some forecasts
fit |>
  forecast() |>
  autoplot(recent_production)

The forecasts look reasonable, although the intervals may be a bit wide. This is likely due to the slight trend not captured by the model (which subsequently violates the assumptions imposed on the residuals).

fpp3 5.11, Ex 5

Produce forecasts for the 7 Victorian series in aus_livestock using SNAIVE(). Plot the resulting forecasts including the historical data. Is this a reasonable benchmark for these series?

aus_livestock |>
  filter(State == "Victoria") |>
  model(SNAIVE(Count)) |>
  forecast(h = "5 years") |>
  autoplot(aus_livestock)

Most point forecasts look reasonable from the seasonal naive method.
Some series are more seasonal than others, and for the series with very weak seasonality it may be better to consider using a naive or drift method.
The prediction intervals in some cases go below zero, so perhaps a log transformation would have been better for these series.

fpp3 5.11, Ex 11

We will use the bricks data from aus_production (Australian quarterly clay brick production 1956–2005) for this exercise.

Use an STL decomposition to calculate the trend-cycle and seasonal indices. (Experiment with having fixed or changing seasonality.)

tidy_bricks <- aus_production |>
  filter(!is.na(Bricks))
tidy_bricks |>
  model(STL(Bricks)) |>
  components() |>
  autoplot()

Data is multiplicative, and so a transformation should be used.

dcmp <- tidy_bricks |>
  model(STL(log(Bricks))) |>
  components()
dcmp |>
  autoplot()

Seasonality varies slightly.

dcmp <- tidy_bricks |>
  model(stl = STL(log(Bricks) ~ season(window = "periodic"))) |>
  components()
dcmp |> autoplot()

The seasonality looks fairly stable, so I’ve used a periodic season (window). The decomposition still performs well when the seasonal component is fixed. The remainder term does not appear to contain a substantial amount of seasonality.

Compute and plot the seasonally adjusted data.

dcmp |>
  as_tsibble() |>
  autoplot(season_adjust)

Use a naïve method to produce forecasts of the seasonally adjusted data.

fit <- dcmp |>
  select(-.model) |>
  model(naive = NAIVE(season_adjust)) |>
  forecast(h = "5 years")
dcmp |>
  as_tsibble() |>
  autoplot(season_adjust) + autolayer(fit)

Use decomposition_model() to reseasonalise the results, giving forecasts for the original data.

fit <- tidy_bricks |>
  model(stl_mdl = decomposition_model(STL(log(Bricks)), NAIVE(season_adjust)))
fit |>
  forecast(h = "5 years") |>
  autoplot(tidy_bricks)

Do the residuals look uncorrelated?

fit |> gg_tsresiduals()

The residuals do not appear uncorrelated as there are several lags of the ACF which exceed the significance threshold.

Repeat with a robust STL decomposition. Does it make much difference?

fit_robust <- tidy_bricks |>
  model(stl_mdl = decomposition_model(STL(log(Bricks)), NAIVE(season_adjust)))

fit_robust |> gg_tsresiduals()

The residuals appear slightly less auto-correlated, however there is still significant auto-correlation at lag 8.

Compare forecasts from decomposition_model() with those from SNAIVE(), using a test set comprising the last 2 years of data. Which is better?

tidy_bricks_train <- tidy_bricks |>
  slice(1:(n() - 8))
fit <- tidy_bricks_train |>
  model(
    stl_mdl = decomposition_model(STL(log(Bricks)), NAIVE(season_adjust)),
    snaive = SNAIVE(Bricks)
  )

fc <- fit |>
  forecast(h = "2 years")
fc |>
  autoplot(tidy_bricks, level = NULL)

The decomposition forecasts appear to more closely follow the actual future data.

fc |>
  accuracy(tidy_bricks)

# A tibble: 2 × 10
  .model  .type    ME  RMSE   MAE     MPE  MAPE  MASE RMSSE    ACF1
  <chr>   <chr> <dbl> <dbl> <dbl>   <dbl> <dbl> <dbl> <dbl>   <dbl>
1 snaive  Test  2.75   20    18.2  0.395   4.52 0.504 0.407 -0.0503
2 stl_mdl Test  0.368  18.1  15.1 -0.0679  3.76 0.418 0.368  0.115

The STL decomposition forecasts are more accurate than the seasonal naive forecasts across all accuracy measures.