Artificial Intelligence for Geospatial Analysis with Pytorch’s TorchGeo (part 3) By Mauricio Cordeiro

By Jessie Hobb On Dec 14, 2022

An end-to-end deep learning geospatial segmentation project using Pytorch and TorchGeo packages

In the previous stories (Part 1 and Part 2), we saw how to prepare a dataset of raster (multispectral) images and combined them with the corresponding labels (ground truth masks) using the IntersectionDataset provided by TorchGeo. To draw samples from it (smaller fixed-sized patches required for training), the RandomGeoSampler was used altogether with the DataLoader object (responsible for providing the batches — group of samples — to the training procedure).

In addition, we added spectral indices and normalization to each batch using the nn.Sequential class. Now, in this last part, we will see how to create a model that is capable of “learning” to correctly segment our images and how to put everything together into a training loop.

So let’s get started!

First, we need to catch up from where we stopped and prepare the Datasets and Dataloaders for the task. Basically, we will need to create two Dataloaders, for training and for validation, following the same procedures as before, as below.

Datasets and DataLoaders

train_imgs = RasterDataset(root=(root/'tra_scene').as_posix(), crs='epsg:3395', res=10, transforms=scale)
train_msks = RasterDataset(root=(root/'tra_truth').as_posix(), crs='epsg:3395', res=10)valid_imgs = RasterDataset(root=(root/'val_scene').as_posix(), crs='epsg:3395', res=10, transforms=scale)
valid_msks = RasterDataset(root=(root/'val_truth').as_posix(), crs='epsg:3395', res=10)
# IMPORTANT
train_msks.is_image = False
valid_msks.is_image = False
train_dset = train_imgs & train_msks
valid_dset = valid_imgs & valid_msks
train_sampler = RandomGeoSampler(train_imgs, size=512, length=260, units=Units.PIXELS)
valid_sampler = RandomGeoSampler(valid_imgs, size=512, length=128, units=Units.PIXELS)

The length specified in the sampler is the number of samples provided in one “pass” (also called one epoch). This is normally the number of patches in the dataset and one epoch should go through all the samples. However, as we are working with a random sampler, we cannot guarantee we are covering all the region. In this case, I defined the length as being four times the number of images in each dataset (train and validation).

Now, let’s create the DataLoaders and check if they are working as expected.

train_dataloader = DataLoader(train_dset, sampler=train_sampler, batch_size=8, collate_fn=stack_samples)
valid_dataloader = DataLoader(valid_dset, sampler=valid_sampler, batch_size=8, collate_fn=stack_samples)train_batch = next(iter(train_dataloader))
valid_batch = next(iter(valid_dataloader))
train_batch.keys(), valid_batch.keys()
code output: 
(dict_keys(['image', 'crs', 'bbox', 'mask']),
dict_keys(['image', 'crs', 'bbox', 'mask']))

Standardization and Spectral indices

For standardization and spectral indices, the procedure is the same that has already been presented in Parts 1 & 2. It is the same for the visualization routines. The following notebook has everything updated up to the correct batch creation.

The last cell shows a sample of a validation dataset batch and the shape of the validation image with 9 channels (6 channels + 3 indices), as expected.

For the semantic segmentation model, we are going to use a predefined architecture that is available in Pytorch. Looking at the official documentation (https://pytorch.org/vision/stable/models.html#semantic-segmentation) it is possible to note 3 models available for semantic segmentation, but one (LRASPP) is intended for mobile applications. In our tutorial, we will use the DeepLabV3 model.

So, let’s create a DeepLabV3 model for 2 classes. In this case, I will skip the pretrained weights, as the weights represent another domain (not water segmentation from multispectral imagery).

from torchvision.models.segmentation import deeplabv3_resnet50
model = deeplabv3_resnet50(weights=None, num_classes=2)model
code output: 
DeepLabV3(
(backbone): IntermediateLayerGetter(
(conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
...

The first thing we have to pay attention in the model architecture is the number of channels expected in the first convolution (Conv2d), that is defined as 3. That’s because the model is prepared to work with RGB images. After the first convolution, the 3 channels will produce 64 channels in lower resolution, and so on. As we have now 9 channels, we will change this first processing layer to adapt correctly to our model. We can do this by replacing the first convolutional layer for a new one, by following the commands. Finally, we check a mock batch can pass through the model and provide the output with 2 channels (water / no_water) as desired.

backbone = model.get_submodule('backbone')conv = nn.modules.conv.Conv2d(
in_channels=9, 
out_channels=64, 
kernel_size=(7, 7),
stride=(2, 2),
padding=(3, 3),
bias=False
)
backbone.register_module('conv1', conv)
pred = model(torch.randn(3, 9, 512, 512))
pred['out'].shape
code output: 
torch.Size([3, 2, 512, 512])

Our architecture seems to be working as expected. The next step is to train it. So let’s create a training loop for it.

The training function should receive the number of epochs, the model, the dataloaders, the loss function (to be optimized) the accuracy function (to assess the results), the optimizer (that will adjust the parameters of the model in the correct direction) and the transformations to be applied to each batch.

def train_loop(
epochs: int, 
train_dl: DataLoader, 
val_dl: Optional[DataLoader], 
model: nn.Module, 
loss_fn: Callable, 
optimizer: torch.optim.Optimizer, 
acc_fns: Optional[List]=None, 
batch_tfms: Optional[Callable]=None
):
# size = len(dataloader.dataset)
cuda_model = model.cuda()for epoch in range(epochs):
accum_loss = 0
for batch in train_dl:
if batch_tfms is not None:
batch = batch_tfms(batch)
X = batch['image'].cuda()
y = batch['mask'].type(torch.long).cuda()
pred = cuda_model(X)['out']
loss = loss_fn(pred, y)
# BackProp
optimizer.zero_grad()
loss.backward()
optimizer.step()
# update the accum loss
accum_loss += float(loss) / len(train_dl)
# Testing against the validation dataset
if acc_fns is not None and val_dl is not None:
# reset the accuracies metrics
acc = [0.] * len(acc_fns)
with torch.no_grad():
for batch in val_dl:
if batch_tfms is not None:
batch = batch_tfms(batch)                    
X = batch['image'].type(torch.float32).cuda()
y = batch['mask'].type(torch.long).cuda()
pred = cuda_model(X)['out']
for i, acc_fn in enumerate(acc_fns):
acc[i] = float(acc[i] + acc_fn(pred, y)/len(val_dl))
# at the end of the epoch, print the errors, etc.
print(f'Epoch {epoch}: Train Loss={accum_loss:.5f} - Accs={[round(a, 3) for a in acc]}')
else:
print(f'Epoch {epoch}: Train Loss={accum_loss:.5f}')

Before calling the training function, let’s create the loss and accuracy functions. In our specific case, we shall have predictions with shape (N, C, d1, d2) and we have masks with the shape (N, 1, d1, d2). For the loss function, normally the Cross Entropy Loss should work, but it requires the mask to have shape (N, d1, d2). In this case, we will need to squeeze our second dimension manually.

Additionally, we will create two accuracy functions. The overall accuracy, used in the original paper and the intersect over union. Usually when we have masks with unbalanced amount pixels in each class, as it is the case for water masks (sometimes we have scenes with just land and very few water bodies), the overall accuracy will result in unrealistic values. In this case, the OA should be avoided, but it is left here for comparison with the original paper.

The overall accuracy is calculated manually by adding all the matches and dividing by the number of elements in the batch. The IoU is also known as Jaccard Index and it is available in Sklearn package. The Pytorch’s cross entropy is used for loss, with a minor adjustment in the target’s shape. After all the necessary adjustments the functions are defined as:

from sklearn.metrics import jaccard_scoredef oa(pred, y):
flat_y = y.squeeze()
flat_pred = pred.argmax(dim=1)
acc = torch.count_nonzero(flat_y == flat_pred) / torch.numel(flat_y)
return acc
def iou(pred, y):
flat_y = y.cpu().numpy().squeeze()
flat_pred = pred.argmax(dim=1).detach().cpu().numpy()
return jaccard_score(flat_y.reshape(-1), flat_pred.reshape(-1), zero_division=1.)
def loss(p, t):    
return torch.nn.functional.cross_entropy(p, t.squeeze())

The training function can now be called like so:

optimizer = torch.optim.Adam(model.parameters(), lr=0.001, weight_decay=0.01)
train_loop(5, train_dataloader, valid_dataloader, model, loss, optimizer, 
acc_fns=[oa, iou], batch_tfms=tfms)code output: 
Epoch 0: Train Loss=0.37275 - Accs=[0.921, 0.631]
Epoch 1: Train Loss=0.22578 - Accs=[0.94, 0.689]
Epoch 2: Train Loss=0.22280 - Accs=[0.906, 0.576]
Epoch 3: Train Loss=0.19370 - Accs=[0.944, 0.706]
Epoch 4: Train Loss=0.18241 - Accs=[0.92, 0.619]
Epoch 5: Train Loss=0.21393 - Accs=[0.956, 0.748]

From the results, we can see that the loss is dropping and accuracy increasing. So our training is working as expected. The first accuracy is the overall accuracy and the second is the IoU. Comparing our results after 10 epochs with the results obtained from the DeepLabV3 tested in the research paper (Luo et al. 2021), we have OA=95.6% and OA = 95.7% (mean obtained from the 3 distinct regions considered in the paper) respectively. Considering we started from arbitrary weights and did not performed any fine tuning on hyperparameters, such as regularization or learning rate, etc., we could say our results are very good. It would be interesting to compare this dataset with other water segmentation algorithms, as single index thresholding (MNDWI, AWEI, etc.) do not provide the best results, despite their simplicity (Cordeiro et al. 2021).

The complete notebook is available here and it can be opened directly in google Colab.

In this Part 3 we finished our project by providing a training loop to optimize the DL model (DeepLab V3) for the task of water segmentation using satellite imagery. The results are promising, but they could be improved by some fine tuning. Besides hyperparameters and more training , data augmentation could also be used to improve accuracy, as well as a different architecture, such as U-Net. It would also be interesting to check the quality of the outputs visually, to understand where the model is performing well and where it is missing the target. These topics were not covered in this story, but if you would like to see more stories like this, don’t hesitate in leaving your requests (and thoughts) in the comments.

Cordeiro, M.C.R., Martinez, J.-M., Peña-Luque, S., 2021. Automatic water detection from multidimensional hierarchical clustering for Sentinel-2 images and a comparison with Level 2A processors. Remote Sensing of Environment 253, 112209. https://doi.org/10.1016/j.rse.2020.112209

Xin Luo, Xiaohua Tong, Zhongwen Hu. An applicable and automatic method for earth surface water mapping based on multispectral images. International Journal of Applied Earth Observation and Geoinformation, 2021, 103, 102472. [Link]