A friend recently made a prediction about the price of oil for the next three months. I thought I would perform some time series forecasting on the West Texas Intermediate prices and see if his numbers were reasonable from a dumb-numbers canned-forecasting perspective. I’m not making the claim that one can reasonably and accurately forecast oil prices with traditional time series techniques. (That’s bogus.) I’m simply doing this to learn more about forecasting.
Monthly petroleum prices can be found at the Energy Information Administration. Ever relevant, Wikipedia has a great write-up on recent trends in oil prices. Also, there is this Times article on the spike and drop in 2008 which had this apt summary,
[Oil prices are] the product of an extremely volatile mixture of speculation, oil production, weather, government policies, the global economy, the number of miles the average American is driving in any given week and so on. But the daily price is actually set — or discovered, in economic parlance — on the futures exchange.
Data
If you go to the EIA site, you can download some monthly and weekly prices in an Excel spreadsheet. The WTI and Brent oil prices are on the second sheet of the Excel file. We can use Python to extract this data from Excel and put it into a Pandas DataFrame
for easier processing.
import pandas, seaborn # create an Excel file object excel = pandas.ExcelFile( '/Users/connorjohnson/Downloads/PET_PRI_SPT_S1_W.xls' ) # parse the first sheet df = excel.parse( excel.sheet_names[1] ) # rename the columns df = df.rename( columns=dict( zip( df.columns, ['Date','WTI','Brent'] ) ) ) # cut off the first 18 rows because these rows # contain NaN values for the Brent prices df = df[18:] # index the data set by date df.index = df['Date'] # remove the date column after re-indexing df = df[['WTI','Brent']]
Visualization
We can then visualize the data with seaborn
. So, the takeaway form the big picture here is that oil prices have gone totally bonkers over the last ten years.
df[['WTI','Brent']][::4].plot() title('Crude Oil Prices') xlabel('Year') ylabel('Price [USD]') savefig("wti_and_brent_all_data.png",dpi=200)
If we focus on the last three years, we get the following picture. The interesting thing about this perspective, apart from the recent marked drop, is that the WTI and Brent prices seem to start getting closer in value over time.
We can drop the last three years of weekly data, the smoothed data, and the de-trended data with the following line,
df[-36*4:].to_csv('prices_last_three_years.csv')
Seasonality
Rob Hyndman is the creator of the R forecast package, and you should really check out his blog. I found this post on R-bloggers where Hyndman proposed an interesting technique for detecting seasonality. Essentially, you fit one model with a seasonal component, and another model without a seasonal component, and then you compare their log-likelihood and their degrees of freedom using a chi-squared distribution.
For this data, I got errors attempting the decompose()
and seasonal HoltWinters()
functions, suggesting that there is not a strong seasonal component.
data <- read.csv('prices_last_three_years.csv') decompose( data$WTI )
Error in decompose(data$WTI) : time series has no or less than 2 periods
HoltWinters( data$WTI )
Error in decompose(ts(x[1L:wind], start = start(x), frequency = f), seasonal) : time series has no or less than 2 periods
Holt-Winters Forecasting
I used this resource for learning about Holt-Winters forecasting. We can fit a model and make predictions using the Holt-Winters method with the following lines of code:
# load Hyndman's forecast package library(forecast) # create a model with a trend WTI.hwm <- HoltWinters( data$WTI, gamma=FALSE ) # predict the values for the next 4 weeks # WTI.hwf <- forecast.HoltWinters( WTI.hwm, h=4 ) # print a summary of the forecast summary( WTI.hwf )
Forecast method: HoltWinters Model Information: Holt-Winters exponential smoothing with trend and without seasonal component. Call: HoltWinters(x = data$WTI, gamma = FALSE) Smoothing parameters: alpha: 1 beta : 0.3259544 gamma: FALSE Coefficients: [,1] a 76.500000 b -2.011559 Error measures: ME RMSE MAE MPE MAPE MASE ACF1 Training set -0.1612072 2.271556 1.785524 -0.1085958 1.855251 1.068586 0.104531 Forecasts: Pt Forecast Lo 80 Hi 80 Lo 95 Hi 95 145 74.48 71.57 77.40 70.03 78.94 146 72.47 67.63 77.31 65.07 79.87 147 70.46 63.63 77.29 60.02 80.90 148 68.45 59.51 77.38 54.79 82.11
For reference, I’ll replace the forecasts with dates. You’ll note that today is 11/23/2014, and that the first predicted data point is for 11/21/2014, but there’s a lag between production and reporting, so the 11/21/2014 numbers won’t be published for a few days yet.
Pt Forecast Lo 80 Hi 80 Lo 95 Hi 95 11/21/2014 74.48 71.57 77.40 70.03 78.94 11/28/2014 72.47 67.63 77.31 65.07 79.87 12/05/2014 70.46 63.63 77.29 60.02 80.90 12/12/2014 68.45 59.51 77.38 54.79 82.11
We can then plot the forecast as follows:
plot.forecast( WTI.hwf )
ARIMA Forecasting
I prefer the outlook of the Holt-Winters forecast, but I decided to do an ARIMA(1,0,0) and ARIMA(2,0,0) forecast also, for completeness. I’m still trying to understand how to interpret the ACF and PACF plots of time series. The canonical data sets make a ton of sense, but things get murkier with more typical data sets. You end up scratching your head and saying, “Hell, let’s just try an AR(1) process.” (This is where the forecast::auto.arima()
function comes in handy.)
WTI.arima <- arima( data$WTI, order=c(1,0,0) ) WTI.arimaf <- forecast.Arima( WTI.arima, h=4 ) summary( WTI.arimaf )
Forecast method: ARIMA(1,0,0) with non-zero mean Model Information: Series: data$WTI ARIMA(1,0,0) with non-zero mean Coefficients: ar1 intercept 0.9700 94.0842 s.e. 0.0218 5.1415 sigma^2 estimated as 4.727: log likelihood=-317.58 AIC=641.16 AICc=641.34 BIC=650.07 Error measures: ME RMSE MAE MPE MAPE MASE ACF1 Training set -0.09167768 2.174233 1.676698 -0.1544393 1.771276 0.997412 0.298963 Forecasts: Pt Forecast Lo 80 Hi 80 Lo 95 Hi 95 11/21 77.02 74.24 79.81 72.76 81.28 11/28 77.54 73.65 81.42 71.60 83.47 12/05 78.03 73.35 82.72 70.87 85.20 12/12 78.52 73.19 83.84 70.36 86.67
And for the AR(2) process, we have the following:
Forecast method: ARIMA(2,0,0) with non-zero mean Model Information: Series: data$WTI ARIMA(2,0,0) with non-zero mean Coefficients: ar1 ar2 intercept 1.2725 -0.3220 94.9581 s.e. 0.0806 0.0829 3.2659 sigma^2 estimated as 4.278: log likelihood=-310.4 AIC=628.79 AICc=629.08 BIC=640.67 Error measures: ME RMSE MAE MPE MAPE MASE Training set -0.03991496 2.068382 1.599582 -0.09327853 1.687477 0.9515379 ACF1 Training set -0.01486163 Forecasts: Pt Forecast Lo 80 Hi 80 Lo 95 Hi 95 11/21 76.85 74.20 79.50 72.79 80.90 11/28 77.86 73.57 82.15 71.30 84.42 12/05 79.03 73.53 84.53 70.62 87.44 12/12 80.20 73.79 86.60 70.40 89.99
Conclusions
I will revisit this post in two weeks with some more details on forecasting, and an update on these forecasts. For future reference, the ets()
and auto.arima()
functions from the forecast
package are a great starting point for forecasting, they will automatically fit a model and return model performance metrics.
One thought on “Time Series Forecasting in Python and R”
Comments are closed.