I needed to sample lines from a file, and I found this SO post on on the Reservoir Algorithm. The code below lets you sample a single line from a file in O(n) time. What makes this code special, besides it’s linear time, simplicity and elegance, is the fact that the file you’re sampling from doesn’t need to fit in memory.
import random def random_line( filename ): hdl = open( filename, "r" ) line = next( hdl ) for num, aline in enumerate( hdl ): if random.randrange(num + 2): continue line = aline return line