This weekend I wanted to work on collecting and plotting historical option contract prices. I used the following API call to pull option contract data from Yahoo!
curl -X GET "http://finance.yahoo.com/q/op?s=AAPL&m=2016-01" | cat > aapl
BeautifulSoup
The first bit imports BeautifulSoup and pandas, and the second bit grabs a filename from the command line, opens the file as data and passes data through BeautifulSoup to produce soup. I knew from looking at the raw HTML that the call and put option contracts were located in a div element that had a class attribute that was called follow-quote-area. So, in line 11, I grabbed the two tables in that div element.
Next, I define a function for iterating through the table rows, using x.find_all("tr"), and then iterating through the data items in each row, using row.find_all("td"), to get the items in the table. The last line in the extractData() function cleans the output up by tossing any lists that don’t contain 10 elements.
#!/usr/bin/env python
from bs4 import BeautifulSoup
import pandas as pd
import sys
filename = sys.argv[1]
data = open( filename, "r" )
soup = BeautifulSoup( data, "html.parser" )
calls, puts = soup.find_all( attrs={ "class": "follow-quote-area" } )
def extractData( x ):
arr = []
for row in x.find_all("tr"):
arr.append([])
for data in row.find_all("td"):
value = data.get_text().strip()
arr[-1].append( value )
arr = filter( lambda x: len(x) == 10, arr )
return arr
pandas
Next, we can store this data in a pandas object for immediate processing, or we can pickle the pandas DataFrame objects for later use.
calls = extractData( calls )
puts = extractData( puts )
columns = [ "Strike"
, "ContractName"
, "Last"
, "Bid"
, "Ask"
, "Change"
, "PctChange"
, "Volume"
, "OpenInterest"
, "ImpliedVolatility" ]
calls = pd.DataFrame( calls, columns=columns )
puts = pd.DataFrame( puts, columns=columns )