Simple Reservoir Algorithm in Python

I needed to sample lines from a file, and I found this SO post on on the Reservoir Algorithm. The code below lets you sample a single line from a file in O(n) time. What makes this code special, besides it’s linear time, simplicity and elegance, is the fact that the file you’re sampling from doesn’t need to fit in memory.

import random

def random_line( filename ):
    hdl = open( filename, "r" )
    line = next( hdl )
    for num, aline in enumerate( hdl ):
        if random.randrange(num + 2):
        line = aline
    return line

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.