This is a follow up on my previous post, in this post I will take a closer look at using ARIMA models in R using the same data set.
Stationarity and Differencing
Time series textbooks stress that data needs to be stationary, meaning that the series fluctuates about a constant mea, and that is exhibits constant variance. Data that has a strong trend should be differenced or de-trended. De-trending is achieved by smoothing the data (using polynomials, splines, exponentially weighted moving averages or a LOWESS curve) and subtracting the smoothed data from the original data. A first difference usually does the trick. Data that exhibits unequal variance can sometimes be cleaned up by applying a log transform, or taking some root of the data.
If you decide that your data needs to be differenced, let the
forecast::Arima() function do the differencing internally. Suppose you have called the
forecast::tsdisplay() function on your differenced data and you decide that your differenced data is best described by an AR(2) process. Then you may create a model by calling the following function on your original, undifferenced data,
fit <- forecast::Arima( data, order=c(2,1,0) )
1 in the
order=c(2,1,0) argument informs the function that you’d like to go ahead and use a first order difference. It is preferable to use the
forecast::Arima() function over the built-in
arima() function, as it returns more information for forecasting. Robert Hyndman explains the advantages of differencing by using the order argument in this CrossValidated post.
auto.arima() function uses the Hyndman-Khandakar algorithm to decide on ARIMA model. Since you’re a human, you should also try other models, but this is a really good start. Ideally, you want the residuals of your fitted data to look normally distributed, and have an ACF that looks like white-noise, i.e., have no correlation at lags > 0.
I listed the results of an ARIMA(1,0,0) and ARIMA(2,0,0) forecast in my previous post. Since then, I decided to see what a forecast based on differenced data looks like. For the record, the
forecast::auto.arima() function suggested an ARIMA(2,0,0) model.
I decided to difference my data because the residuals look more normal to me after differencing.
data <- read.csv('prices_last_three_years.csv') WTI <- data$WTI fit.210 <- forecast::Arima( WTI, order=c(2,1,0) ) hist( fit.210$residuals, main="ARIMA(2,1,0) Residuals" )
Also, the ACF of the residuals has slightly lower values for all lags.
forecast::Acf( fit.210$residuals, main="ACF of ARIMA(2,1,0) Residuals" )
This is the summary of the forecast,
summary( forecast( fit.210, h=4 ) )
Forecast method: ARIMA(2,1,0) Model Information: Series: WTI ARIMA(2,1,0) Coefficients: ar1 ar2 0.2829 0.0111 s.e. 0.0853 0.0856 sigma^2 estimated as 4.399: log likelihood=-308.87 AIC=623.75 AICc=623.92 BIC=632.64 Error measures: ME RMSE MAE MPE MAPE MASE Training set -0.1284872 2.090164 1.602199 -0.1600143 1.684015 0.9530948 ACF1 Training set -0.01321749 Forecasts: Pt Forecast Lo 80 Hi 80 Lo 95 Hi 95 11/21 75.97 73.28 78.66 71.86 80.08 11/28 75.80 71.43 80.17 69.11 82.49 12/05 75.75 70.02 81.47 66.99 84.50 12/12 75.73 68.88 82.58 65.25 86.21
This is a plot of the ARIMA(2,1,0) forecast,
plot.forecast( forecast( fit.210 ) )
In contrast, here is the plot for the ARIMA(2,0,0) forecast,
So, we’ve done several forecasts over the last two posts, and they’re all different. The Holt-Winters forecast predicts that the price of WTI Crude will fall, the ARIMA(2,1,0) predicts that it will hold what it’s got, and the ARIMA(2,0,0) predicts that it will probably climb back up in the coming weeks. The difficulty is that oil prices are affected by a huge number of variables, not the least of which are rig counts and wind conditions. It would be interesting to incorporate rig counts into the forecasting process, but I don’t have access to those numbers.
This is another great resource on ARIMA modeling and forecasting.