Techno Blender
Digitally Yours.

Deep Image Prior in PyTorch. Image Denoising with No Data and a… | by Ta-Ying Cheng | Aug, 2022

0 49


Image Denoising with No Data and a Random Network

Figure 1. DIP Pipeline. A single image is used for training, and the aim is to reconstruct the image from the noise. Eventually the network learns to reconstruct a denoised version of the image. Image created by author.

Deep learning and neural networks have been tightly associated with big data. Whether it is image classification of language translation, you almost always require a vast quantity of data to boost the task accuracy for the model to be applicable to real-world datasets. Even under few-shot or one-shot scenarios, the preliminary is that you still need a large variety of data to train the network. But what if I tell you that you don’t need any data or any pre-trained network, and yet you can perform image restoration or even super-resolution?

In this article, we will dive into a completely different realm of deep networks, namely deep image priors (DIP), that doesn’t require any datasets for training and yet learns to separate noises and images to perform image restoration. A A PyTorch tutorial would be discussed in detail to showcase the power of DIP.

Figure 1 is a simple illustration of how DIP works. It is unexpectedly simple. You start by having a randomly-initialised network that aims to reconstruct the target image from pure noise. The output reconstruction from the network is then compared with the original image to compute a loss function to subsequently update the network. After some iterations, you will be surprised to find that the network will start to output a “denoised” version of the original image.

In essence, our entire training process is optimising the network to withhold prior information of the image, hence the name “deep image prior”.

So why does this work?

Theoretically, a network should be able to pickup all aspects of the images, from coarse to detail, which also includes the inherent noises. In practice, however, the network is more likely to pick up coherent and consistent features within an image, before finally picking up the noise and thus “overfitting” to the entire image. Hence, if we stop the training in the middle before overfitting, the network output becomes a clean version of the original image, serving our image restoration purpose.

Libraries and Hardware Requirements

This implementation is built upon PyTorch and OpenCV. Normally, neural networks work better with GPUs for parallel computations. However, due to the special nature of DIPs where only the single image we are denoising is used, a CPU is sufficient.

The following is the code for importing libaries and introducing GPUs (if any):

Network Architecture

According to the original DIP paper, different network architectures do work differently. We created an hourglass network with skip connections following the settings suggested by the paper.

The following is the implementation of the network:

Training

The training is rather unorthodox, as we only have one image and have to sequentially optimise it. This means that we are completely omitting the batch training capability of PyTorch. Please note that networks do take into a batch dimension so we have to unsqueeze the image before computing the loss.

The following is the implementation of the training:

Figure 2. DIP Results. Image created by author.

We provide the results after 100, 500, 1000, and 2000 as the following. As you can see from Figure 2, the DIP network first learns the clean features across the image, yielding a clean version of the image in the middle. However, as the training progresses, while the resolution increases, some noises are also brought into the image.

Interestingly, after the DIP paper, Gandelsman et al. proposed a variant called Double-DIP, where they found out that optimising two priors at the same time can encourage the network to learn features separately, leading to meaningful image decomposition and even foreground and background separation.

The full paper can be found here.

And there you have it! One network, one image, with no data at all and you can perform image denoising and restoration from scratch. The full implementation of DIP can be found here:

Thank you for making it this far 🙏! I will be posting more on different areas of computer vision/deep learning. Make sure to check out my other articles on computer vision methods too!


Image Denoising with No Data and a Random Network

Figure 1. DIP Pipeline. A single image is used for training, and the aim is to reconstruct the image from the noise. Eventually the network learns to reconstruct a denoised version of the image. Image created by author.

Deep learning and neural networks have been tightly associated with big data. Whether it is image classification of language translation, you almost always require a vast quantity of data to boost the task accuracy for the model to be applicable to real-world datasets. Even under few-shot or one-shot scenarios, the preliminary is that you still need a large variety of data to train the network. But what if I tell you that you don’t need any data or any pre-trained network, and yet you can perform image restoration or even super-resolution?

In this article, we will dive into a completely different realm of deep networks, namely deep image priors (DIP), that doesn’t require any datasets for training and yet learns to separate noises and images to perform image restoration. A A PyTorch tutorial would be discussed in detail to showcase the power of DIP.

Figure 1 is a simple illustration of how DIP works. It is unexpectedly simple. You start by having a randomly-initialised network that aims to reconstruct the target image from pure noise. The output reconstruction from the network is then compared with the original image to compute a loss function to subsequently update the network. After some iterations, you will be surprised to find that the network will start to output a “denoised” version of the original image.

In essence, our entire training process is optimising the network to withhold prior information of the image, hence the name “deep image prior”.

So why does this work?

Theoretically, a network should be able to pickup all aspects of the images, from coarse to detail, which also includes the inherent noises. In practice, however, the network is more likely to pick up coherent and consistent features within an image, before finally picking up the noise and thus “overfitting” to the entire image. Hence, if we stop the training in the middle before overfitting, the network output becomes a clean version of the original image, serving our image restoration purpose.

Libraries and Hardware Requirements

This implementation is built upon PyTorch and OpenCV. Normally, neural networks work better with GPUs for parallel computations. However, due to the special nature of DIPs where only the single image we are denoising is used, a CPU is sufficient.

The following is the code for importing libaries and introducing GPUs (if any):

Network Architecture

According to the original DIP paper, different network architectures do work differently. We created an hourglass network with skip connections following the settings suggested by the paper.

The following is the implementation of the network:

Training

The training is rather unorthodox, as we only have one image and have to sequentially optimise it. This means that we are completely omitting the batch training capability of PyTorch. Please note that networks do take into a batch dimension so we have to unsqueeze the image before computing the loss.

The following is the implementation of the training:

Figure 2. DIP Results. Image created by author.

We provide the results after 100, 500, 1000, and 2000 as the following. As you can see from Figure 2, the DIP network first learns the clean features across the image, yielding a clean version of the image in the middle. However, as the training progresses, while the resolution increases, some noises are also brought into the image.

Interestingly, after the DIP paper, Gandelsman et al. proposed a variant called Double-DIP, where they found out that optimising two priors at the same time can encourage the network to learn features separately, leading to meaningful image decomposition and even foreground and background separation.

The full paper can be found here.

And there you have it! One network, one image, with no data at all and you can perform image denoising and restoration from scratch. The full implementation of DIP can be found here:

Thank you for making it this far 🙏! I will be posting more on different areas of computer vision/deep learning. Make sure to check out my other articles on computer vision methods too!

FOLLOW US ON GOOGLE NEWS

Read original article here

Denial of responsibility! Techno Blender is an automatic aggregator of the all world’s media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials, please contact us by email – [email protected]. The content will be deleted within 24 hours.

Leave a comment