Consider the the number of pigs slaughtered in Victoria, available in the aus_livestock dataset.
Use the ETS() function in R to estimate the equivalent model for simple exponential smoothing. Find the optimal values of \(\alpha\) and \(\ell_0\), and generate forecasts for the next four months.
fit <- aus_livestock |>filter(Animal =="Pigs", State =="Victoria") |>model(ses =ETS(Count ~error("A") +trend("N") +season("N")))report(fit)
Optimal values are \(\alpha = 0.3221247\) and \(\ell_0 = 100646.6\)
fc <- fit |>forecast(h ="4 months")fc
# A fable: 4 x 6 [1M]
# Key: Animal, State, .model [1]
Animal State .model Month Count .mean
<fct> <fct> <chr> <mth> <dist> <dbl>
1 Pigs Victoria ses 2019 Jan N(95187, 8.7e+07) 95187.
2 Pigs Victoria ses 2019 Feb N(95187, 9.7e+07) 95187.
3 Pigs Victoria ses 2019 Mar N(95187, 1.1e+08) 95187.
4 Pigs Victoria ses 2019 Apr N(95187, 1.1e+08) 95187.
fc |>autoplot(filter(aus_livestock, Month >=yearmonth("2010 Jan")))
Compute a 95% prediction interval for the first forecast using \(\hat{y} \pm 1.96s\) where \(s\) is the standard deviation of the residuals. Compare your interval with the interval produced by R.
s <-augment(fit) |>pull(.resid) |>sd()yhat <- fc |>pull(.mean) |>head(1)yhat +c(-1, 1) *1.96* s
[1] 76871.01 113502.10
fc |>head(1) |>mutate(interval =hilo(Count, 95)) |>pull(interval)
<hilo[1]>
[1] [76854.79, 113518.3]95
The intervals are close but not identical. This is because R estimates the variance of the residuals differently, taking account of the degrees of freedom properly (and also using a more accurate critical value rather than just 1.96).
Try the following.
res <-augment(fit) |>pull(.resid)s <-sqrt(sum(res^2) / (length(res) -NROW(tidy(fit))))yhat +c(-1, 1) *qnorm(0.975) * s
[1] 76854.79 113518.33
fpp3 8.8, Ex2
Write your own function to implement simple exponential smoothing. The function should take arguments y (the response data), alpha (the smoothing parameter \(\alpha\)) and level (the initial level \(\ell_0\)). It should return the forecast of the next observation in the series. Does it give the same forecast as ETS()?
Yes, the same forecasts are obtained. The slight differences are due to rounding of \(\alpha\) and \(\ell_0\).
fpp3 8.8, Ex3
Modify your function from the previous exercise to return the sum of squared errors rather than the forecast of the next observation. Then use the optim() function to find the optimal values of \(\alpha\) and \(\ell_0\). Do you get the same values as the ETS() function?
# A tibble: 2 × 5
Animal State .model term estimate
<fct> <fct> <chr> <chr> <dbl>
1 Pigs Victoria ses alpha 0.322
2 Pigs Victoria ses l[0] 100647.
Similar, but not identical estimates. This is due to different starting values being used.
fpp3 8.8, Ex4
Combine your previous two functions to produce a function that both finds the optimal values of \(\alpha\) and \(\ell_0\), and produces a forecast of the next observation in the series.
There is a huge jump in Exports in 2002, due to the deregulation of the Argentinian peso. Since then, Exports (as a percentage of GDP) has gradually returned to 1990 levels.
Use an ETS(A,N,N) model to forecast the series, and plot the forecasts.
Compare the results to those from an ETS(A,A,N) model. (Remember that the trended model is using one more parameter than the simpler model.) Discuss the merits of the two forecasting methods for this data set.
# A tibble: 2 × 11
Country .model .type ME RMSE MAE MPE MAPE MASE RMSSE ACF1
<fct> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Argentina ses Training 0.0762 2.78 1.62 -1.73 15.7 0.983 0.986 0.00902
2 Argentina holt Training 0.00795 2.78 1.64 -2.51 15.9 0.994 0.986 0.0271
There is very little difference in training RMSE between these models. So the extra parameter is not doing much.
Compare the forecasts from both methods. Which do you think is best?
fit |>forecast(h =10) |>autoplot(global_economy)
The forecasts are similar. In this case, the simpler model is preferred.
Calculate a 95% prediction interval for the first forecast for each series, using the RMSE values and assuming normal errors. Compare your intervals with those produced using R.
standard error. (from RMSE)
mean (from forecast)
s <-accuracy(fit) |>pull(RMSE)yhat <-forecast(fit, h =1) |>pull(.mean)# SESyhat[1] +c(-1, 1) *qnorm(0.975) * s[1]
[1] 5.882074 16.764136
# Holtyhat[2] +c(-1, 1) *qnorm(0.975) * s[2]
[1] 5.989515 16.872908
fit |>forecast(h =1) |>mutate(PI =hilo(Exports, level =95))
# A fable: 2 x 6 [1Y]
# Key: Country, .model [2]
Country .model Year Exports .mean PI
<fct> <chr> <dbl> <dist> <dbl> <hilo>
1 Argentina ses 2018 N(11, 8) 11.3 [5.785765, 16.86044]95
2 Argentina holt 2018 N(11, 8.3) 11.4 [5.791571, 17.07085]95
Using RMSE yields narrower prediction interval while using the values from hilo() function gives wider prediction interval.
Using RMSE has failed to take account of the degrees of freedom for each model. Compare the following