Extracting Images from .eml Files

In this post I’ll provide some code for parsing an .eml file and extracting images. I was able to perfrom the parsing with the help of a great blog post I found here. Turning the blocks of ASCII letters back into JPEGs and PNGs took some work.

Data

You can get sample data by going to your email and finding out how to view the original text. In GMail, you can expand the menu next to the reply button, which looks like a swoopy left arrow, and selecting Show Original or something. This should open a new browser tab with a bunch of ASCII text. Copy this and save it was some.eml or something.

Code

Go to the link in the introduction, scroll down and save parsemail.py in whatever directory you saved some.eml. Open a Python interpreter or IPython notebook from this directory and start with the following

import email
import io
from PIL import Image
from parsemail import get_mail_contents

The email module converts the .EML file to a string. The get_mail_contents() function from the parsemail module does what you’d expect. The io module converts the payload to the correct encoding. The Image module is used to convert the byte data to an actual image.

Note: PIL, the Python Imaging Library has been discontinued, so you should pip-install the pillow module. Importing should look the same though, just do import PIL as usual.

em = "some.eml"
msg = email.message_from_string( open( em, "r" ).read() )
attachments = get_mail_contents( msg )

for c in contents:
    if c.filename != None:
        atype, afmt = c.type.split('/')
        if atype == 'image':
            fh = io.BytesIO( c.payload )
            im = Image.open( fh )
            im.save( c.filename )

This will save the images with their original names and extensions in your working directory, like magic.