# ARIMA Forecasting in R

This is a follow up on my previous post, in this post I will take a closer look at using ARIMA models in R using the same data set.

# Stationarity and Differencing

Time series textbooks stress that data needs to be stationary, meaning that the series fluctuates about a constant mea, and that is exhibits constant variance. Data that has a strong trend should be differenced or de-trended. De-trending is achieved by smoothing the data (using polynomials, splines, exponentially weighted moving averages or a LOWESS curve) and subtracting the smoothed data from the original data. A first difference usually does the trick. Data that exhibits unequal variance can sometimes be cleaned up by applying a log transform, or taking some root of the data.

If you decide that your data needs to be differenced, let the `forecast::Arima()` function do the differencing internally. Suppose you have called the `forecast::tsdisplay()` function on your differenced data and you decide that your differenced data is best described by an AR(2) process. Then you may create a model by calling the following function on your original, undifferenced data,

```fit <- forecast::Arima( data, order=c(2,1,0) )
```

The `1` in the `order=c(2,1,0)` argument informs the function that you’d like to go ahead and use a first order difference. It is preferable to use the `forecast::Arima()` function over the built-in `arima()` function, as it returns more information for forecasting. Robert Hyndman explains the advantages of differencing by using the order argument in this CrossValidated post.

# Using auto.arima()

The `auto.arima()` function uses the Hyndman-Khandakar algorithm to decide on ARIMA model. Since you’re a human, you should also try other models, but this is a really good start. Ideally, you want the residuals of your fitted data to look normally distributed, and have an ACF that looks like white-noise, i.e., have no correlation at lags > 0.

# Forecasting

I listed the results of an ARIMA(1,0,0) and ARIMA(2,0,0) forecast in my previous post. Since then, I decided to see what a forecast based on differenced data looks like. For the record, the `forecast::auto.arima()` function suggested an ARIMA(2,0,0) model.

I decided to difference my data because the residuals look more normal to me after differencing.

```data <- read.csv('prices_last_three_years.csv')
WTI <- data\$WTI
fit.210 <- forecast::Arima( WTI, order=c(2,1,0) )
hist( fit.210\$residuals, main="ARIMA(2,1,0) Residuals" )
``` Also, the ACF of the residuals has slightly lower values for all lags.

```forecast::Acf( fit.210\$residuals, main="ACF of ARIMA(2,1,0) Residuals" )
``` This is the summary of the forecast,

```summary( forecast( fit.210, h=4 ) )
```
```Forecast method: ARIMA(2,1,0)

Model Information:
Series: WTI
ARIMA(2,1,0)

Coefficients:
ar1     ar2
0.2829  0.0111
s.e.  0.0853  0.0856

sigma^2 estimated as 4.399:  log likelihood=-308.87
AIC=623.75   AICc=623.92   BIC=632.64

Error measures:
ME     RMSE      MAE        MPE     MAPE      MASE
Training set -0.1284872 2.090164 1.602199 -0.1600143 1.684015 0.9530948
ACF1
Training set -0.01321749

Forecasts:
Pt Forecast Lo 80 Hi 80 Lo 95 Hi 95
11/21      75.97 73.28 78.66 71.86 80.08
11/28      75.80 71.43 80.17 69.11 82.49
12/05      75.75 70.02 81.47 66.99 84.50
12/12      75.73 68.88 82.58 65.25 86.21
```

This is a plot of the ARIMA(2,1,0) forecast,

```plot.forecast( forecast( fit.210 ) )
``` In contrast, here is the plot for the ARIMA(2,0,0) forecast, # Conclusion

So, we’ve done several forecasts over the last two posts, and they’re all different. The Holt-Winters forecast predicts that the price of WTI Crude will fall, the ARIMA(2,1,0) predicts that it will hold what it’s got, and the ARIMA(2,0,0) predicts that it will probably climb back up in the coming weeks. The difficulty is that oil prices are affected by a huge number of variables, not the least of which are rig counts and wind conditions. It would be interesting to incorporate rig counts into the forecasting process, but I don’t have access to those numbers.