<- global_economy |>
us_economy filter(Country == "United States")
|>
us_economy autoplot(GDP)
Exercise solutions: Section 3.7
fpp3 3.7, Ex 2
For each of the following series, make a graph of the data. If transforming seems appropriate, do so and describe the effect.
- United States GDP from
global_economy
- Slaughter of Victorian “Bulls, bullocks and steers” in
aus_livestock
- Victorian Electricity Demand from
vic_elec
.- Gas production from
aus_production
United States GDP
- Trend appears exponential, a transformation would be useful.
|>
us_economy autoplot(box_cox(GDP, 0))
- A log transformation (Box-Cox with \(\lambda = 0\)) appears slightly too strong.
|>
us_economy autoplot(box_cox(GDP, 0.3))
- Using \(\lambda = 0.3\) looks pretty good, the trend is now almost linear.
Let’s see what guerrero’s method suggests.
|>
us_economy features(GDP, features = guerrero)
# A tibble: 1 × 2
Country lambda_guerrero
<fct> <dbl>
1 United States 0.282
Pretty close to \(\lambda = 0.3\), let’s see how it looks:
|>
us_economy autoplot(box_cox(GDP, 0.2819714))
- More or less the same. Box-Cox transformations are usually insensitive to the choice of \(\lambda\).
Slaughter of Victorian “Bulls, bullocks and steers”
<- aus_livestock |>
vic_bulls filter(State == "Victoria", Animal == "Bulls, bullocks and steers")
|>
vic_bulls autoplot(Count)
- Variation in the series appears to vary slightly with the number of bulls slaughtered in Victoria.
- A transformation may be useful.
|>
vic_bulls autoplot(log(Count))
- A log transformation (Box-Cox \(\lambda = 0\)) appears to normalise most of the variation. Let’s check with guerrero’s method.
|>
vic_bulls features(Count, features = guerrero)
# A tibble: 1 × 3
Animal State lambda_guerrero
<fct> <fct> <dbl>
1 Bulls, bullocks and steers Victoria -0.0446
- Pretty close, guerrero suggests \(\lambda = -0.045\). This is close enough to zero, so it is probably best to just use a log transformation (allowing better interpretations).
Victorian Electricity Demand
|>
vic_elec autoplot(Demand)
Seasonal patterns for time of day hidden due to density of ink.
Day-of-week seasonality just visible.
Time-of-year seasonality is clear with increasing variance in winter and high skewness in summer.
|>
vic_elec autoplot(box_cox(Demand, 0))
A log transformation makes the variance more even and reduces the skewness.
Guerrero’s method doesn’t work here as there are several types of seasonality.
Australian Gas production
|>
aus_production autoplot(Gas)
- Variation in seasonal pattern grows proportionally to the amount of gas produced in Australia. A transformation should work well here.
|>
aus_production autoplot(box_cox(Gas, 0))
- A log transformation appears slightly too strong, where the variation in periods with smaller gas production is now larger than the variation during greater gas production.
|>
aus_production features(Gas, features = guerrero)
# A tibble: 1 × 1
lambda_guerrero
<dbl>
1 0.110
- Guerrero’s method agrees by selecting a slightly weaker transformation. Let’s see how it looks.
|>
aus_production autoplot(box_cox(Gas, 0.1095))
Looking good! The variation is now constant across the series.
fpp3 3.7, Ex 3
Why is a Box-Cox transformation unhelpful for the
canadian_gas
data?
|>
canadian_gas autoplot(Volume) +
labs(
x = "Year", y = "Gas production (billion cubic meters)",
title = "Monthly Canadian gas production"
)
Here the variation in the series is not proportional to the amount of gas production in Canada.
When small and large amounts of gas is being produced, we can observe small variation in the seasonal pattern.
However, between 1975 and 1990 the gas production is moderate, and the variation is large.
Power transformations (like the Box-Cox transformation) require the variability of the series to vary proportionately to the level of the series.
fpp3 3.7, Ex 10
This exercise uses the
canadian_gas
data (monthly Canadian gas production in billions of cubic metres, January 1960 – February 2005).
- Plot the data using
autoplot()
,gg_subseries()
andgg_season()
to look at the effect of the changing seasonality over time. What do you think is causing it to change so much?
|> autoplot(Volume) canadian_gas
|> gg_subseries(Volume) canadian_gas
|> gg_season(Volume) canadian_gas
- The changes in seasonality are possibly due to changes in the regulation of gas prices.
- Do an STL decomposition of the data. You will need to choose a seasonal window to allow for the changing shape of the seasonal component.
<- canadian_gas |>
fit model(STL(Volume)) |>
components()
fit
# A dable: 542 x 7 [1M]
# Key: .model [1]
# : Volume = trend + season_year + remainder
.model Month Volume trend season_year remainder season_adjust
<chr> <mth> <dbl> <dbl> <dbl> <dbl> <dbl>
1 STL(Volume) 1960 Jan 1.43 1.08 0.520 -0.172 0.911
2 STL(Volume) 1960 Feb 1.31 1.11 0.215 -0.0178 1.09
3 STL(Volume) 1960 Mar 1.40 1.13 0.307 -0.0395 1.09
4 STL(Volume) 1960 Apr 1.17 1.16 0.0161 -0.00627 1.15
5 STL(Volume) 1960 May 1.12 1.18 -0.116 0.0476 1.23
6 STL(Volume) 1960 Jun 1.01 1.21 -0.356 0.159 1.37
7 STL(Volume) 1960 Jul 0.966 1.23 -0.403 0.136 1.37
8 STL(Volume) 1960 Aug 0.977 1.26 -0.349 0.0677 1.33
9 STL(Volume) 1960 Sep 1.03 1.28 -0.340 0.0870 1.37
10 STL(Volume) 1960 Oct 1.25 1.31 -0.0899 0.0329 1.34
# ℹ 532 more rows
names(fit)
[1] ".model" "Month" "Volume" "trend"
[5] "season_year" "remainder" "season_adjust"
|> autoplot() fit
- How does the seasonal shape change over time? [Hint: Try plotting the seasonal component using
gg_season()
.]
|> gg_season(season_year) fit
- Here the changes are easier to see. Up to about 1990 there is strong seasonality with the greatest volume in the Canadian winter.
- The seasonality increases in size over time. After 1990 the seasonality changes shape and appears to be driven partly by the month length near the end of the series.
- Can you produce a plausible seasonally adjusted series?
|>
canadian_gas autoplot(Volume) +
autolayer(fit, season_adjust, col = "blue")
- Compare the results with those obtained using SEATS and X11. How are they different?
# remember to load library(seasonal) before attempting this question!
|>
canadian_gas model(X_13ARIMA_SEATS(Volume ~ seats())) |>
components() |>
autoplot()
|>
canadian_gas model(X_13ARIMA_SEATS(Volume ~ x11())) |>
components() |>
autoplot()
Note that SEATS fits a multiplicative decomposition by default, so it is hard to directly compare the results with the other two methods.
The X11 seasonal component is quite similar to the STL seasonal component. Both SEATS and X11 have estimated a more wiggly trend line than STL.
Take home message \ SEATS: multiplicative \ X11:MA