Concatenating and Visualizing Data in Pandas

One of my favorite things about pandas is that you can easily combine temporal data sets using different time scales. Behind the scenes, pandas will fill in the empty gaps with null values, and then quietly ignore those null values when you want to make a scatter plot or do some other computation, like a rolling mean. It takes so much tedious book-keeping out of the data analysis process.

Combining Data

Suppose we want to look at prices for West Texas Intermediate, Henry Hub, and Brent Crude, or any other (reasonable) number of time series. We can put the separate data sets into separate pandas Series, and then concatenate those series into a single pandas DataFrame, and pandas will sort out the dates automatically. In this example, we’ll assume that wti, hh, and bc are NumPy arrays with two columns, and possibly different numbers numbers of rows. The first column will hold dates as strings, and the second column will hold prices.

import pandas

wti = pandas.Series( data=wti[:,1], index=wti[:,0], name="WTI" )
hh  = pandas.Series( data=hh[:,1],  index=hh[:,0],  name="HH" )
bc  = pandas.Series( data=bc[:,1],  index=bc[:,0],  name="BC" )

If you are creating pandas Series from an existing pandas DataFrame(s) then you should remember to grab the raw data from the columns using the .values method, no the just the column itself. For example,

# here, "gas" is some DataFrame with dates on the first column, and prices on the second
wti = pandas.Series( data=gas.icol(1).values, index=gas.icol(0).values, name="WTI" )

Next, we can create a data frame. This will match the dates up automatically, and fill in the gaps with NaN values.

df = pandas.concat( [ wti, hh, bc ], axis=1 )

Visualizing with a Scatter Plot Matrix

The documentation for plotting in Pandas can be found here. This is alternatively known as a trellis plot, which can be confusing when you’re trying to Google examples. The basic example is the following line of code.

axs = pandas.scatter_matrix( df, figsize=(12,12), diagonal='kde' ) ;

I also found some extra code here that cleans up the figure a bit.

def wrap(txt, width=10):
    '''helper function to wrap text for long labels'''
    import textwrap
    return '\n'.join(textwrap.wrap(txt, width))

for ax in axs[:,0]: # the left boundary
    ax.grid('off', axis='both')
    ax.set_ylabel(wrap(ax.get_ylabel()), rotation=0, va='center', labelpad=20)

for ax in axs[-1,:]: # the lower boundary
    ax.grid('off', axis='both')
    ax.set_xlabel(wrap(ax.get_xlabel()), rotation=45)

savefig( 'scatter_wti_hh_bc.png', fmt='png', dpi=300 )