Downloading Landsat Satellite Images with Python | by Conor O’Sullivan | May, 2023

By Jessie Hobb On May 9, 2023

Streamline Landsat scene downloads with the landsatxplore Python package

Landsat satellites are among the most commonly used sources of Earth observation data. They have been providing high-quality images of the planet’s surface for over four decades. However, manually downloading these images can be tedious! Fortunately, with the landsatxplore package, you can easily download and process Landsat scenes with a few lines of code.

We will explore the landsatxplore package and walk through the steps to download Landsat satellite images using Python. This includes:

Setting up API access with a USGS account
Searching and filtering Landsat scenes
Downloading and working with the scenes in Python

Say goodbye to manual downloads and hello to an automated, efficient workflow!

Step 1: Register for USGS

To start, you will need to setup a USGS account. This is the same account you use to download scenes with EarthExplorer. Keep your username and password as we will need it later.

Once you are registered, you can use the USGS M2M API. However, this would require some work to set it up. Instead, we will use the lansatxplore package. It abstracts away most of the technical details for you.

Step 2: Install lansatxplore

Follow the instructions on the GitRepo.

Step 3: Check the API connection

Use the code below to confirm that everything is set up correctly. You should replace the username and password with the details you used to register your USGS account.

from landsatxplore.api import API# Your USGS  credentials
username = "XXXXXXXXXXXX"
password = "XXXXXXXXXXXX"
# Initialize a new API instance
api = API(username, password)
# Perform a request
response = api.request(endpoint="dataset-catalogs")
print(response)

The output of the response should look like this:

{‘EE’: ‘EarthExplorer’, ‘GV’: ‘GloVis’, ‘HDDS’: ‘HDDS Explorer’}

These are the datasets available through the API. For this tutorial, we’re only considering the EarthExplorer dataset.

Before we move on to downloading scenes with the API, we’ll do a manual search with EarthExplorer. This is so we can compare the results to what we see using Python. If you are not familiar with the EarthExplorer portal then this tutorial may help.

We search for scenes using the following criteria:

They must contain the point at the given latitude and longitude. This point falls on Bull Island in Dublin.
Taken between 01/01/2020 to 12/31/2022
Maximum cloud cover of 50%
Part of the Level 2 Landsat 8 or 9 collection

EarthExplorer search criteria. It includes details on the location of scenes, data of acquisition, maximum cloud cover and Landsat collection. — EarthExplorer search criteria (source: author)

You can see the results of the search below. We note some things to compare to our Python search:

There are 54 scenes that match the search criteria.
There are 2 tiles that contain the point on Bull Island. These have path and row values of (206, 023) and (205,023).
The first scene’s ID is LC08_L2SP_206023_20221118_20221128_02_T1. If you are interested in what this ID means then see the Landsat naming convention.

EarthExplorer search results. 54 unique scenes are returned. — EarthExplorer search results (source: author)

Searching for scenes

Let’s do the equivalent search using landsatxplore. We do this using the search function below:

dataset — defines which satellite we want scenes for. The value we used is the Dataset ID for Landsat 8 and 9. See the GitRepo for the ID for Landsat 5 and 7.
The latitude and longitude give the same point on Bull Island. Except we have converted the coordinates to decimals.
The start_date, end_date and max_cloud_cover are also the same as before.

# Search for Landsat TM scenes
scenes = api.search(
dataset='landsat_ot_c2_l2',
latitude=53.36305556,
longitude=-6.15583333,
start_date='2020-01-01',
end_date='2022-12-31',
max_cloud_cover=50
)# log out
api.logout()

The search result will return information in a JSON dictionary. We convert this to a Pandas DataFrame (line 4) where every row will represent a unique scene. A lot of metadata is returned! So, we filter what is necessary for this application (line 5). Lastly, we order this by the acquisition_date — the date the scene was captured by Landsat.

import pandas as pd# Create a DataFrame from the scenes
df_scenes = pd.DataFrame(scenes)
df_scenes = df_scenes[['display_id','wrs_path', 'wrs_row','satellite','cloud_cover','acquisition_date']]
df_scenes.sort_values('acquisition_date', ascending=False, inplace=True)

You can see a snapshot of this dataset below. Comparing this to our search using EarthExplorer, we can be certain that the results are the same. This dataset has 54 rows and there are two unique wrs_path and wrs_row pairs — (206, 23) and (205,23). The first display_id is also the same as what we saw before.

Snapshot of dataset with landsat scenes obtained using Python. — Snapshot of df_scenes dataset (source: author)

If we wanted, we could filter that dataset further. We could use the satellite column to select only images from Landsat 8 or 9. Additionally, the cloud_cover column gives the percentage of the image covered by clouds. When you are happy with the final list of scenes, you can move on to downloading them.

Downloading data

Below, we have the code used to download a Landsat scene. We use the EarthExplorer function (line 5). This is initialised in the same way as before — using your USGS credentials. To download a scene, we need to use its display_id and define the output directory (line 12). We are using the display_id of the first scene we mentioned above (line 8).

from landsatxplore.earthexplorer import EarthExplorer
import os# Initialize the API
ee = EarthExplorer(username, password)
# Select the first scene
ID = 'LC08_L2SP_206023_20221118_20221128_02_T1'
# Download the scene 
try: 
ee.download(ID, output_dir='./data')
print('{} succesful'.format(ID))
# Additional error handling
except:
if os.path.isfile('./data/{}.tar'.format(ID)):
print('{} error but file exists'.format(ID))
else:
print('{} error'.format(ID))
ee.logout()

You may have noticed the additional error handling above. This is because of an issue with the package. In some cases, scenes will be downloaded correctly but an error is still raised. The additional error handling double-checks that a scene’s file exists.

Working with the data

A scene will be downloaded as a tar file. The name of the file will be the display_id followed by .tar:

LC08_L2SP_206023_20221118_20221128_02_T1.tar

We can work with this data directly in Python. To start, we need to unzip the tarfile (lines 4–6). The name of the new folder is set to the scene’s display_id (line 5).

import tarfile# Extract files from tar archive
tar = tarfile.open('./data/{}.tar'.format(ID))
tar.extractall('./data/{}'.format(ID))
tar.close()

You can see the unzipped folder and all the available files below. This includes all the information available for Lansat level-2 science products. The applications for this data are endless! For one example, we’ll visualise this scene using the visible light bands. These are available in the files highlighted below.

The list of downloaded Landsat level-2 science product files. The visiable light bands are highlighted. — Landsat level-2 science product files (source: author)

We load the blue, green and red bands (lines 6–8). We stack these bands (line 11), scale them (line 12) and clip them to enhance the contrast (line 15). Finally, we display this image using matplotlib (lines 18–20). You can see this image below.

import tifffile as tiff
import numpy as np
import matplotlib.pyplot as plt# Load Blue (B2), Green (B3) and Red (B4) bands
B2 = tiff.imread('./data/{}/{}_SR_B2.TIF'.format(ID, ID))
B3 = tiff.imread('./data/{}/{}_SR_B3.TIF'.format(ID, ID))
B4 = tiff.imread('./data/{}/{}_SR_B4.TIF'.format(ID, ID))
# Stack and scale bands
RGB = np.dstack((B4, B3, B2))
RGB = np.clip(RGB*0.0000275-0.2, 0, 1)
# Clip to enhance contrast
RGB = np.clip(RGB,0,0.2)/0.2
# Display RGB image
fig, ax = plt.subplots(figsize=(10, 10))
plt.imshow(RGB)
ax.set_axis_off()

A visualisation of the RGB channels of the Landsat scene. It shows the east coast of Ireland. — Visualising the RGB channels of the Landsat scene (source: author)

Check out this article, if you want more details on working with the RGB channels of satellite images:

Visualizing RGB channels is only the beginning. After downloading the data, we can do any remote sensing task — from calculating indices and training models. The best part is we can do this all without leaving our Python notebook.