In this post I’ll discuss the basics of walking through a directory tree in Python and Go. If you are dealing with a smaller directory, it may be more convenient to use Python. If you are dealing with a larger directory containing hundreds of subdirectories and thousands of files, you may want to look into using Go, or another compiled language. I enjoy using Go because it compiles quickly, and it doesn’t use pointer arithmetic.
Python
The first loop begins walking through the directory tree starting at path
using the function os.walk()
. Each iteration returns a tuple containing the directory path, a list of subdirectories, and a list of filenames. This loop contains an inner loop that collects the filenames in each directory. When we are done looping through the directory, then we write out a text file with all of the paths, and report on the time taken for the whole ordeal.
import time, os path = '' fns = list() t0 = time.time() for ( dirpath, dirnames, filenames ) in os.walk( path ): for filename in filenames: fns.append( os.sep.join([dirpath, filename]) ) h = open( 'filenames_python.txt', 'w' ) for fn in fns: h.write( fn ) h.close() t1 = time.time() print 'Total: {}'.format( round( (t1-t0)/60.0, 3 ) )
Go
Here, we have a slightly simpler Go script that sends the filenames to the standard output. The path/filepath
package contains code for traversing the directory tree, flag
contains code for parsing command line arguments (another example here), fmt
gives us printing functionality, and os
gives us access to the file system.
package main import ( "path/filepath" "flag" "fmt" "os" ) func visit( path string, f os.FileInfo, err error ) error { fmt.Printf( "%s\n", path ) return nil } func main() { flag.Parse() root := flag.Arg(0) err := filepath.Walk( root, visit ) fmt.Printf( "filepath.Walk() returned %v\n", err ) }
Since I did not have the patience to learn how to time things from the Windows command line, I called this from an IPython notebook sessions as,
t0 = time.time() os.system("go run file_path_walk.go "+path+" > filenames_go.txt") t1 = time.time() print "Total: {}".format( round( ( t1 - t0 ) / 60.0, 3 ) )
This uses the redirection operator >
to redirect data from the standard output to a text file.
Conclusion
I compared the two approaches by walking through a remotely hosted directory tree with over 90,000 files in hundreds of subdirectories. The Python approach took about two and a half hours, while the Go approach took just under an hour.
For more examples of filepath.Walk see http://xojoc.pw/justcode/golang-file-tree-traversal.html