This weekend I wanted to work on collecting and plotting historical option contract prices. I used the following API call to pull option contract data from Yahoo!
curl -X GET "http://finance.yahoo.com/q/op?s=AAPL&m=2016-01" | cat > aapl
BeautifulSoup
The first bit imports BeautifulSoup and pandas, and the second bit grabs a filename from the command line, opens the file as data
and passes data
through BeautifulSoup to produce soup
. I knew from looking at the raw HTML that the call and put option contracts were located in a div element that had a class attribute that was called follow-quote-area. So, in line 11, I grabbed the two tables in that div element.
Next, I define a function for iterating through the table rows, using x.find_all("tr")
, and then iterating through the data items in each row, using row.find_all("td")
, to get the items in the table. The last line in the extractData()
function cleans the output up by tossing any lists that don’t contain 10 elements.
#!/usr/bin/env python from bs4 import BeautifulSoup import pandas as pd import sys filename = sys.argv[1] data = open( filename, "r" ) soup = BeautifulSoup( data, "html.parser" ) calls, puts = soup.find_all( attrs={ "class": "follow-quote-area" } ) def extractData( x ): arr = [] for row in x.find_all("tr"): arr.append([]) for data in row.find_all("td"): value = data.get_text().strip() arr[-1].append( value ) arr = filter( lambda x: len(x) == 10, arr ) return arr
pandas
Next, we can store this data in a pandas object for immediate processing, or we can pickle the pandas DataFrame objects for later use.
calls = extractData( calls ) puts = extractData( puts ) columns = [ "Strike" , "ContractName" , "Last" , "Bid" , "Ask" , "Change" , "PctChange" , "Volume" , "OpenInterest" , "ImpliedVolatility" ] calls = pd.DataFrame( calls, columns=columns ) puts = pd.DataFrame( puts, columns=columns )