I explained in a previous post how to quickly and easily grab data from Excel files using pandas. I use this approach when I know that there’s a ton of data in the Excel file and I want it in a pandas DataFrame. If I am only extracting a handful of values, I like to use a lower level module, `xlrd`

.

# Opening a Workbook

First we import `xlrd`

, then we define the file name and the sheet name that we are interested in. Next, in line 11, we open the Excel file as a `Book`

object. Then, in line 15, we use the `Book`

object to create a `Sheet`

object, from which we can access specific cells.

import xlrd # the filename fn = "workBook.xls" # the name of the sheet sht = "Summary" # open a workbook by filename # this returns a Book object wkbk = xlrd.open_workbook( fn ) # use the Book object # to produce a Sheet object sheet = wkbk.sheet_by_name( sht )

# Accessing Data

Given a row number and column number, we can access the data at that cell. While Excel cells are indexed by the column letter and the row number, Python indexes these guys by the row number and the column number, sans letters. Therefore the cell **K4** in Excel is **(3,11)** in Python.

# extract a value from the sheet row = 3 col = 5 value = sheet.cell_value( row, col ) # alternatively we can say value = sheet.cell( row, cal ).value

We can also grab a subset of a column; this is often referred to as a *slice*. Note that in Python the end of the slice is exclusive, not inclusive. (This complements the convention of indexing arrays beginning with zero rather than one as in Excel/VBA and MATLAB.) Therefore the range **B2:B5** in Excel is the slice **1:5** in Python; recall that the indexing begins at zero in Python, not one. As an example, let’s grab the values from the range **B2:B5**.

# specify the column # A -> 0, B-> 1,... col = 1 # indexing starts at 0, not 1 row_begin = 1 # <- 1 = 2-1 # again, indexing starts at 0, but the # upper bound is exclusive, not inclusive row_end = 5 # <- 5 = 5-1+1 values = sheet.col_values( col, row_begin, row_end )

# Handling Dates

Excel handles dates awkwardly, to quote the `xlrd`

documentation,

In reality, there are no such things. What you have are floating point numbers and pious hope.

Bearing that in mind, `xlrd`

offers some functionality for handling dates. It will return a tuple in the format `( year, month, day, hour, minute, second )`

. Assuming that our date and/or time is at **C4** in the Excel sheet, we can extract that date using the cell using the `cell_value()`

function described above, and by using the `xldate_as_tuple()`

function provided by the `xlrd`

module. We will also need to pass `datamode`

attribute of the `Book`

object to the function.

xlrd.xldate_as_tuple( sheet.cell_value( 3, 2 ), wkbk.datemode )

# Update

Since I’ve written this, a new tool has come out called PyXLL that allows users to write Excel add-ins in Python, and perform other tasks. Learn more here: https://www.pyxll.com/blog/tools-for-working-with-excel-and-python/