I explained in a previous post how to quickly and easily grab data from Excel files using pandas. I use this approach when I know that there’s a ton of data in the Excel file and I want it in a pandas DataFrame. If I am only extracting a handful of values, I like to use a lower level module,
Opening a Workbook
First we import
xlrd, then we define the file name and the sheet name that we are interested in. Next, in line 11, we open the Excel file as a
Book object. Then, in line 15, we use the
Book object to create a
Sheet object, from which we can access specific cells.
import xlrd # the filename fn = "workBook.xls" # the name of the sheet sht = "Summary" # open a workbook by filename # this returns a Book object wkbk = xlrd.open_workbook( fn ) # use the Book object # to produce a Sheet object sheet = wkbk.sheet_by_name( sht )
Given a row number and column number, we can access the data at that cell. While Excel cells are indexed by the column letter and the row number, Python indexes these guys by the row number and the column number, sans letters. Therefore the cell K4 in Excel is (3,11) in Python.
# extract a value from the sheet row = 3 col = 5 value = sheet.cell_value( row, col ) # alternatively we can say value = sheet.cell( row, cal ).value
We can also grab a subset of a column; this is often referred to as a slice. Note that in Python the end of the slice is exclusive, not inclusive. (This complements the convention of indexing arrays beginning with zero rather than one as in Excel/VBA and MATLAB.) Therefore the range B2:B5 in Excel is the slice 1:5 in Python; recall that the indexing begins at zero in Python, not one. As an example, let’s grab the values from the range B2:B5.
# specify the column # A -> 0, B-> 1,... col = 1 # indexing starts at 0, not 1 row_begin = 1 # <- 1 = 2-1 # again, indexing starts at 0, but the # upper bound is exclusive, not inclusive row_end = 5 # <- 5 = 5-1+1 values = sheet.col_values( col, row_begin, row_end )
Excel handles dates awkwardly, to quote the
In reality, there are no such things. What you have are floating point numbers and pious hope.
Bearing that in mind,
xlrd offers some functionality for handling dates. It will return a tuple in the format
( year, month, day, hour, minute, second ). Assuming that our date and/or time is at C4 in the Excel sheet, we can extract that date using the cell using the
cell_value() function described above, and by using the
xldate_as_tuple() function provided by the
xlrd module. We will also need to pass
datamode attribute of the
Book object to the function.
xlrd.xldate_as_tuple( sheet.cell_value( 3, 2 ), wkbk.datemode )
Since I’ve written this, a new tool has come out called PyXLL that allows users to write Excel add-ins in Python, and perform other tasks. Learn more here: https://www.pyxll.com/blog/tools-for-working-with-excel-and-python/