Downloading Images with Python

Outline

The imports required
How to structure file paths
Retrieving the image data
Bringing it all together

Have you ever built a web scraper and wanted to download more than HTML files? In this post, we will discuss how to download images from the web using a byte stream.

The imports required

For this project, we will need the requests library to get the image file contents from a remote server and the os library to deal with file paths.

import requests
import os

How to structure file paths

When scraping images from the web, an issue of naming the images comes up. Three options first present themselves.

Create a random string for the filename
Strip the filename from the URL
Recreate the original file path using the URL

Option 1 can often be a viable solution, however, you lose the original context of the image. (i.e. the path and filename) Option 2 runs into the issue of multiple images having the same filename and thus you have to deal with duplicates. Lastly, we have option 3. Option 3 solves the problems of 1 and 2 although it does place the scraped images in subfolders which can be more difficult for viewing multiple images at once.

Option 3 can be implemented as follows:

BASE_FOLDER = 'saves'
domain      = 'https://<some domain>.com'

filename = BASE_FOLDER + '/' + domain[8:] + link[len(domain):]

All images are placed in a base folder and then a first-level sub-folder with the domain name (https:// is striped). The rest of the link creates further sub-folders and the final file name. Now to ensure we are able to save the file successfully, we need to use the os library to check if the file path already exists and if not, build the path.

# make sure path exists
os.makedirs(os.path.dirname(filename), exist_ok=True)

Retrieving the image data

To retrieve the image contents, we must use the requests library.

# Grab the image content
img_data = requests.get(link).content

The content attribute contains the bytes needed to write the image file. This is the key part of downloading images in Python. Once we have the bytes we can use the standard Python procedure to write a file using the wb access mode (i.e. write bytes).

# Write to file
with open(filename, 'wb') as f:
    f.write(img_data)

Bringing it all together

import requests
import os

BASE_FOLDER = 'saves'

def save_image(link, domain=None, filename=None):
    """
        link = https://<some image URL>
        Saves in BASE_FOLDER unless filename is set
    """
    print('Downloading (img): ', link)

    # Build the file path based on the link
    if not filename:
        filename = BASE_FOLDER + '/' + domain[8:] + link[len(domain):]

    # Don't overwrite if the file already exists
    if os.path.exists(filename):
        return

    # make sure path exists
    os.makedirs(os.path.dirname(filename), exist_ok=True)

    # Grab the image content
    img_data = requests.get(link).content

    # Write to file
    with open(filename, 'wb') as f:
        f.write(img_data)

We now have a concise method to handle our image downloads.

Downloading Images with Python

The imports required

How to structure file paths

Retrieving the image data

Bringing it all together

Comments

Login to Add comments.