Techno Blender
Digitally Yours.

Top 10 Pre-Trained Models for Image Embedding every Data Scientist Should Know | by Satyam Kumar | Apr, 2023

0 52


Image by Chen from Pixabay

The rapid developments in Computer Vision — image classification use cases have been further accelerated by the advent of transfer learning. It takes a lot of computational resources and time to train a computer vision neural network model on a large dataset of images.

Luckily, this time and resources can be shortened by using pre-trained models. The technique of leveraging feature representation from a pre-trained model is called transfer learning. The pre-trained are generally trained using high-end computational resources and on massive datasets.

The pre-trained models can be used in various ways:

  • Using the pre-trained weights and directly making predictions on the test data
  • Using the pre-trained weights for initialization and training the model using the custom dataset
  • Using only the architecture of the pre-trained network, and training it from scratch on the custom dataset

This article walks through the top 10 state-of-the-art pre-trained models to get image embedding. All these pre-trained models can be loaded as keras models using the keras.application API.

CNN Architecture discussed in this article:
1) VGG
2) Xception
3) ResNet
4) InceptionV3
5) InceptionResNet
6) MobileNet
7) DenseNet
8) NasNet
9) EfficientNet
10) ConvNEXT

The VGG-16/19 networks were introduced at the ILSVRC 2014 conference since it is one of the most popular pre-trained models. It was developed by the Visual Graphics Group at the University of Oxford.

There are two variations of the VGG model: 16 and 19 layers network, VGG-19 (19-layer network) being an improvement of the VGG-16 (16-layer network) model.

Architecture:

(Source), VGG-16 Network architecture

The VGG network is simple and sequential in nature and uses a lot of filters. At each stage, small (3*3) filters are used to reduce the number of parameters.

The VGG-16 network has the following:

  • Convolutional Layers = 13
  • Pooling Layers = 5
  • Fully Connected Dense Layers = 3

Input: Image of dimensions (224, 224, 3)

Output: Image embedding of 1000-dimension

Other Details for VGG-16/19:

  • Paper Link: https://arxiv.org/pdf/1409.1556.pdf
  • GitHub: VGG
  • Published On: April 2015
  • Performance on ImageNet Dataset: 71% (Top 1 Accuracy), 90% (Top 5 Accuracy)
  • Number of Parameters: ~140M
  • Number of Layers: 16/19
  • Size on Disk: ~530MB

Implementation:

tf.keras.applications.VGG16(
include_top=True,
weights="imagenet",
input_tensor=None,
input_shape=None,
pooling=None,
classes=1000,
classifier_activation="softmax",
)

The above-mentioned code is for VGG-16 implementation, keras offers a similar API for VGG-19 implementation, for more details refer to this documentation.

Xception is a deep CNN architecture that involves depthwise separable convolutions. A depthwise separable convolution can be understood as an Inception model with a maximally large number of towers.

Architecture:

(Source), Xception architecture

Input: Image of dimensions (299, 299, 3)

Output: Image embedding of 1000-dimension

Other Details for Xception:

  • Paper Link: https://arxiv.org/pdf/1409.1556.pdf
  • GitHub: Xception
  • Published On: April 2017
  • Performance on ImageNet Dataset: 79% (Top 1 Accuracy), 94.5% (Top 5 Accuracy)
  • Number of Parameters: ~30M
  • Depth: 81
  • Size on Disk: 88MB

Implementation:

  • Instantiate the Xception model using the below-mentioned code:
tf.keras.applications.Xception(
include_top=True,
weights="imagenet",
input_tensor=None,
input_shape=None,
pooling=None,
classes=1000,
classifier_activation="softmax",
)

The above-mentioned code is for Xception implementation, for more details refer to this documentation.

The previous CNN architectures were not designed to scale to many convolutional layers. It resulted in a vanishing gradient problem and limited performance upon adding new layers to the existing architecture.

ResNets architecture offers to skip connections to solve the vanishing gradient problem.

Architecture:

(Source), ResNet architecture

This ResNet model uses a 34-layer network architecture inspired by the VGG-19 model to which the shortcut connections are added. These shortcut connections then convert the architecture into a residual network.

There are several versions of ResNet architecture:

  • ResNet50
  • ResNet50V2
  • ResNet101
  • ResNet101V2
  • ResNet152
  • ResNet152V2

Input: Image of dimensions (224, 224, 3)

Output: Image embedding of 1000-dimension

Other Details for ResNet models:

  • Paper Link: https://arxiv.org/pdf/1512.03385.pdf
  • GitHub: ResNet
  • Published On: Dec 2015
  • Performance on ImageNet Dataset: 75–78% (Top 1 Accuracy), 92–93% (Top 5 Accuracy)
  • Number of Parameters: 25–60M
  • Depth: 107–307
  • Size on Disk: ~100–230MB

Implementation:

  • Instantiate the ResNet50 model using the below-mentioned code:
tf.keras.applications.ResNet50(
include_top=True,
weights="imagenet",
input_tensor=None,
input_shape=None,
pooling=None,
classes=1000,
**kwargs
)

The above-mentioned code is for ResNet50 implementation, keras offers a similar API to other ResNet architecture implementations, for more details refer to this documentation.

Multiple deep layers of convolutions resulted in the overfitting of the data. To avoid overfitting, the inception model uses parallel layers or multiple filters of different sizes on the same level, to make the model wider rather than making it deeper. The Inception V1 model is made of 4 parallel layers with: (1*1), (3*3), (5*5) convolutions, and (3*3) max pooling.

Inception (V1/V2/V3) is deep learning model-based CNN network developed by a team at Google. InceptionV3 is an advanced and optimized version of the InceptionV1 and V2 models.

Architecture:

The InceptionV3 model is made up of 42 layers. The architecture of InceptionV3 is progressively step-by-step built as:

  • Factorized Convolutions
  • Smaller Convolutions
  • Asymmetric Convolutions
  • Auxilliary Convolutions
  • Grid Size Reduction

All these concepts are consolidated into the final architecture mentioned below:

(Source), InceptionV3 architecture

Input: Image of dimensions (299, 299, 3)

Output: Image embedding of 1000-dimension

Other Details for InceptionV3 models:

Implementation:

  • Instantiate the InceptionV3 model using the below-mentioned code:
tf.keras.applications.InceptionV3(
include_top=True,
weights="imagenet",
input_tensor=None,
input_shape=None,
pooling=None,
classes=1000,
classifier_activation="softmax",
)

The above-mentioned code is for InceptionV3 implementation, for more details refer to this documentation.

InceptionResNet-v2 is a CNN model developed by researchers at Google. The target of this model was to reduce the complexity of InceptionV3 and explore the possibility of using residual networks on the Inception model.

Architecture:

(Source), Inception-ResNet-V2 architecture

Input: Image of dimensions (299, 299, 3)

Output: Image embedding of 1000-dimension

Other Details for Inception-ResNet-V2 models:

Implementation:

  • Instantiate the Inception-ResNet-V2 model using the below-mentioned code:
tf.keras.applications.InceptionResNetV2(
include_top=True,
weights="imagenet",
input_tensor=None,
input_shape=None,
pooling=None,
classes=1000,
classifier_activation="softmax",
**kwargs
)

The above-mentioned code is for Inception-ResNet-V2 implementation, for more details refer to this documentation.

MobileNet is a streamlined architecture that uses depthwise separable convolutions to construct deep convolutional neural networks and provides an efficient model for mobile and embedded vision applications.

Architecture:

(Source), Mobile-Net architecture

Input: Image of dimensions (224, 224, 3)

Output: Image embedding of 1000-dimension

Other Details for MobileNet models:

Implementation:

  • Instantiate the MobileNet model using the below-mentioned code:
tf.keras.applications.MobileNet(
input_shape=None,
alpha=1.0,
depth_multiplier=1,
dropout=0.001,
include_top=True,
weights="imagenet",
input_tensor=None,
pooling=None,
classes=1000,
classifier_activation="softmax",
**kwargs
)

The above-mentioned code is for MobileNet implementation, keras offers a similar API to other MobileNet architecture (MobileNet-V2, MobileNet-V3) implementation, for more details refer to this documentation.

DenseNet is a CNN model developed to improve accuracy caused by the vanishing gradient in high-level neural networks due to the long distance between input and output layers and the information vanishes before reaching the destination.

Architecture:

A DenseNet architecture has 3 dense blocks. The layers between two adjacent blocks are referred to as transition layers and change feature-map sizes via convolution and pooling.

(Source), DenseNet architecture

Input: Image of dimensions (224, 224, 3)

Output: Image embedding of 1000-dimension

Other Details for DenseNet models:

Implementation:

  • Instantiate the DenseNet121 model using the below-mentioned code:
tf.keras.applications.DenseNet121(
include_top=True,
weights="imagenet",
input_tensor=None,
input_shape=None,
pooling=None,
classes=1000,
classifier_activation="softmax",
)

The above-mentioned code is for DenseNet implementation, keras offers a similar API to other DenseNet architecture (DenseNet-169, DenseNet-201) implementation, for more details refer to this documentation.

Google researchers designed a NasNet model that framed the problem to find the best CNN architecture as a Reinforcement Learning approach. The idea is to search for the best combination of parameters of the given search space of a number of layers, filter sizes, strides, output channels, etc.

Input: Image of dimensions (331, 331, 3)

Other Details for NasNet models:

  • Paper Link: https://arxiv.org/pdf/1608.06993.pdf
  • Published On: Apr 2018
  • Performance on ImageNet Dataset: 75–83% (Top 1 Accuracy), 92–96% (Top 5 Accuracy)
  • Number of Parameters: 5–90M
  • Depth: 389–533
  • Size on Disk: 23–343MB

Implementation:

  • Instantiate the NesNetLarge model using the below-mentioned code:
tf.keras.applications.NASNetLarge(
input_shape=None,
include_top=True,
weights="imagenet",
input_tensor=None,
pooling=None,
classes=1000,
classifier_activation="softmax",
)

The above-mentioned code is for NesNet implementation, keras offers a similar API to other NasNet architecture (NasNetLarge, NasNetMobile) implementation, for more details refer to this documentation.

EfficientNet is a CNN architecture from the researchers of Google, that can achieve better performance by a scaling method called compound scaling. This scaling method uniformly scales all dimensions of depth/width/resolution by a fixed amount (compound coefficient) uniformly.

Architecture:

(Source), Efficient-B0 architecture

Other Details for EfficientNet Models:

Implementation:

  • Instantiate the EfficientNet-B0 model using the below-mentioned code:
tf.keras.applications.EfficientNetB0(
include_top=True,
weights="imagenet",
input_tensor=None,
input_shape=None,
pooling=None,
classes=1000,
classifier_activation="softmax",
**kwargs
)

The above-mentioned code is for EfficientNet-B0 implementation, keras offers a similar API for other EfficientNet architecture (EfficientNet-B0 to B7, EfficientNet-V2-B0 to B3) implementation, for more details refer to this documentation, and this documentation.

The ConvNeXt CNN model was proposed as a pure convolutional model (ConvNet), inspired by the design of Vision Transformers, that claims to outperform them.

Architecture:

(Source), ConvNeXt architecture

Other Details for ConvNeXt models:

Implementation:

  • Instantiate the ConvNeXt-Tiny model using the below-mentioned code:
tf.keras.applications.ConvNeXtTiny(
model_name="convnext_tiny",
include_top=True,
include_preprocessing=True,
weights="imagenet",
input_tensor=None,
input_shape=None,
pooling=None,
classes=1000,
classifier_activation="softmax",
)

The above-mentioned code is for ConvNeXt-Tiny implementation, keras offers a similar API of the other EfficientNet architecture (ConvNeXt-Small, ConvNeXt-Base, ConvNeXt-Large, ConvNeXt-XLarge) implementation, for more details refer to this documentation.


Image by Chen from Pixabay

The rapid developments in Computer Vision — image classification use cases have been further accelerated by the advent of transfer learning. It takes a lot of computational resources and time to train a computer vision neural network model on a large dataset of images.

Luckily, this time and resources can be shortened by using pre-trained models. The technique of leveraging feature representation from a pre-trained model is called transfer learning. The pre-trained are generally trained using high-end computational resources and on massive datasets.

The pre-trained models can be used in various ways:

  • Using the pre-trained weights and directly making predictions on the test data
  • Using the pre-trained weights for initialization and training the model using the custom dataset
  • Using only the architecture of the pre-trained network, and training it from scratch on the custom dataset

This article walks through the top 10 state-of-the-art pre-trained models to get image embedding. All these pre-trained models can be loaded as keras models using the keras.application API.

CNN Architecture discussed in this article:
1) VGG
2) Xception
3) ResNet
4) InceptionV3
5) InceptionResNet
6) MobileNet
7) DenseNet
8) NasNet
9) EfficientNet
10) ConvNEXT

The VGG-16/19 networks were introduced at the ILSVRC 2014 conference since it is one of the most popular pre-trained models. It was developed by the Visual Graphics Group at the University of Oxford.

There are two variations of the VGG model: 16 and 19 layers network, VGG-19 (19-layer network) being an improvement of the VGG-16 (16-layer network) model.

Architecture:

(Source), VGG-16 Network architecture

The VGG network is simple and sequential in nature and uses a lot of filters. At each stage, small (3*3) filters are used to reduce the number of parameters.

The VGG-16 network has the following:

  • Convolutional Layers = 13
  • Pooling Layers = 5
  • Fully Connected Dense Layers = 3

Input: Image of dimensions (224, 224, 3)

Output: Image embedding of 1000-dimension

Other Details for VGG-16/19:

  • Paper Link: https://arxiv.org/pdf/1409.1556.pdf
  • GitHub: VGG
  • Published On: April 2015
  • Performance on ImageNet Dataset: 71% (Top 1 Accuracy), 90% (Top 5 Accuracy)
  • Number of Parameters: ~140M
  • Number of Layers: 16/19
  • Size on Disk: ~530MB

Implementation:

tf.keras.applications.VGG16(
include_top=True,
weights="imagenet",
input_tensor=None,
input_shape=None,
pooling=None,
classes=1000,
classifier_activation="softmax",
)

The above-mentioned code is for VGG-16 implementation, keras offers a similar API for VGG-19 implementation, for more details refer to this documentation.

Xception is a deep CNN architecture that involves depthwise separable convolutions. A depthwise separable convolution can be understood as an Inception model with a maximally large number of towers.

Architecture:

(Source), Xception architecture

Input: Image of dimensions (299, 299, 3)

Output: Image embedding of 1000-dimension

Other Details for Xception:

  • Paper Link: https://arxiv.org/pdf/1409.1556.pdf
  • GitHub: Xception
  • Published On: April 2017
  • Performance on ImageNet Dataset: 79% (Top 1 Accuracy), 94.5% (Top 5 Accuracy)
  • Number of Parameters: ~30M
  • Depth: 81
  • Size on Disk: 88MB

Implementation:

  • Instantiate the Xception model using the below-mentioned code:
tf.keras.applications.Xception(
include_top=True,
weights="imagenet",
input_tensor=None,
input_shape=None,
pooling=None,
classes=1000,
classifier_activation="softmax",
)

The above-mentioned code is for Xception implementation, for more details refer to this documentation.

The previous CNN architectures were not designed to scale to many convolutional layers. It resulted in a vanishing gradient problem and limited performance upon adding new layers to the existing architecture.

ResNets architecture offers to skip connections to solve the vanishing gradient problem.

Architecture:

(Source), ResNet architecture

This ResNet model uses a 34-layer network architecture inspired by the VGG-19 model to which the shortcut connections are added. These shortcut connections then convert the architecture into a residual network.

There are several versions of ResNet architecture:

  • ResNet50
  • ResNet50V2
  • ResNet101
  • ResNet101V2
  • ResNet152
  • ResNet152V2

Input: Image of dimensions (224, 224, 3)

Output: Image embedding of 1000-dimension

Other Details for ResNet models:

  • Paper Link: https://arxiv.org/pdf/1512.03385.pdf
  • GitHub: ResNet
  • Published On: Dec 2015
  • Performance on ImageNet Dataset: 75–78% (Top 1 Accuracy), 92–93% (Top 5 Accuracy)
  • Number of Parameters: 25–60M
  • Depth: 107–307
  • Size on Disk: ~100–230MB

Implementation:

  • Instantiate the ResNet50 model using the below-mentioned code:
tf.keras.applications.ResNet50(
include_top=True,
weights="imagenet",
input_tensor=None,
input_shape=None,
pooling=None,
classes=1000,
**kwargs
)

The above-mentioned code is for ResNet50 implementation, keras offers a similar API to other ResNet architecture implementations, for more details refer to this documentation.

Multiple deep layers of convolutions resulted in the overfitting of the data. To avoid overfitting, the inception model uses parallel layers or multiple filters of different sizes on the same level, to make the model wider rather than making it deeper. The Inception V1 model is made of 4 parallel layers with: (1*1), (3*3), (5*5) convolutions, and (3*3) max pooling.

Inception (V1/V2/V3) is deep learning model-based CNN network developed by a team at Google. InceptionV3 is an advanced and optimized version of the InceptionV1 and V2 models.

Architecture:

The InceptionV3 model is made up of 42 layers. The architecture of InceptionV3 is progressively step-by-step built as:

  • Factorized Convolutions
  • Smaller Convolutions
  • Asymmetric Convolutions
  • Auxilliary Convolutions
  • Grid Size Reduction

All these concepts are consolidated into the final architecture mentioned below:

(Source), InceptionV3 architecture

Input: Image of dimensions (299, 299, 3)

Output: Image embedding of 1000-dimension

Other Details for InceptionV3 models:

Implementation:

  • Instantiate the InceptionV3 model using the below-mentioned code:
tf.keras.applications.InceptionV3(
include_top=True,
weights="imagenet",
input_tensor=None,
input_shape=None,
pooling=None,
classes=1000,
classifier_activation="softmax",
)

The above-mentioned code is for InceptionV3 implementation, for more details refer to this documentation.

InceptionResNet-v2 is a CNN model developed by researchers at Google. The target of this model was to reduce the complexity of InceptionV3 and explore the possibility of using residual networks on the Inception model.

Architecture:

(Source), Inception-ResNet-V2 architecture

Input: Image of dimensions (299, 299, 3)

Output: Image embedding of 1000-dimension

Other Details for Inception-ResNet-V2 models:

Implementation:

  • Instantiate the Inception-ResNet-V2 model using the below-mentioned code:
tf.keras.applications.InceptionResNetV2(
include_top=True,
weights="imagenet",
input_tensor=None,
input_shape=None,
pooling=None,
classes=1000,
classifier_activation="softmax",
**kwargs
)

The above-mentioned code is for Inception-ResNet-V2 implementation, for more details refer to this documentation.

MobileNet is a streamlined architecture that uses depthwise separable convolutions to construct deep convolutional neural networks and provides an efficient model for mobile and embedded vision applications.

Architecture:

(Source), Mobile-Net architecture

Input: Image of dimensions (224, 224, 3)

Output: Image embedding of 1000-dimension

Other Details for MobileNet models:

Implementation:

  • Instantiate the MobileNet model using the below-mentioned code:
tf.keras.applications.MobileNet(
input_shape=None,
alpha=1.0,
depth_multiplier=1,
dropout=0.001,
include_top=True,
weights="imagenet",
input_tensor=None,
pooling=None,
classes=1000,
classifier_activation="softmax",
**kwargs
)

The above-mentioned code is for MobileNet implementation, keras offers a similar API to other MobileNet architecture (MobileNet-V2, MobileNet-V3) implementation, for more details refer to this documentation.

DenseNet is a CNN model developed to improve accuracy caused by the vanishing gradient in high-level neural networks due to the long distance between input and output layers and the information vanishes before reaching the destination.

Architecture:

A DenseNet architecture has 3 dense blocks. The layers between two adjacent blocks are referred to as transition layers and change feature-map sizes via convolution and pooling.

(Source), DenseNet architecture

Input: Image of dimensions (224, 224, 3)

Output: Image embedding of 1000-dimension

Other Details for DenseNet models:

Implementation:

  • Instantiate the DenseNet121 model using the below-mentioned code:
tf.keras.applications.DenseNet121(
include_top=True,
weights="imagenet",
input_tensor=None,
input_shape=None,
pooling=None,
classes=1000,
classifier_activation="softmax",
)

The above-mentioned code is for DenseNet implementation, keras offers a similar API to other DenseNet architecture (DenseNet-169, DenseNet-201) implementation, for more details refer to this documentation.

Google researchers designed a NasNet model that framed the problem to find the best CNN architecture as a Reinforcement Learning approach. The idea is to search for the best combination of parameters of the given search space of a number of layers, filter sizes, strides, output channels, etc.

Input: Image of dimensions (331, 331, 3)

Other Details for NasNet models:

  • Paper Link: https://arxiv.org/pdf/1608.06993.pdf
  • Published On: Apr 2018
  • Performance on ImageNet Dataset: 75–83% (Top 1 Accuracy), 92–96% (Top 5 Accuracy)
  • Number of Parameters: 5–90M
  • Depth: 389–533
  • Size on Disk: 23–343MB

Implementation:

  • Instantiate the NesNetLarge model using the below-mentioned code:
tf.keras.applications.NASNetLarge(
input_shape=None,
include_top=True,
weights="imagenet",
input_tensor=None,
pooling=None,
classes=1000,
classifier_activation="softmax",
)

The above-mentioned code is for NesNet implementation, keras offers a similar API to other NasNet architecture (NasNetLarge, NasNetMobile) implementation, for more details refer to this documentation.

EfficientNet is a CNN architecture from the researchers of Google, that can achieve better performance by a scaling method called compound scaling. This scaling method uniformly scales all dimensions of depth/width/resolution by a fixed amount (compound coefficient) uniformly.

Architecture:

(Source), Efficient-B0 architecture

Other Details for EfficientNet Models:

Implementation:

  • Instantiate the EfficientNet-B0 model using the below-mentioned code:
tf.keras.applications.EfficientNetB0(
include_top=True,
weights="imagenet",
input_tensor=None,
input_shape=None,
pooling=None,
classes=1000,
classifier_activation="softmax",
**kwargs
)

The above-mentioned code is for EfficientNet-B0 implementation, keras offers a similar API for other EfficientNet architecture (EfficientNet-B0 to B7, EfficientNet-V2-B0 to B3) implementation, for more details refer to this documentation, and this documentation.

The ConvNeXt CNN model was proposed as a pure convolutional model (ConvNet), inspired by the design of Vision Transformers, that claims to outperform them.

Architecture:

(Source), ConvNeXt architecture

Other Details for ConvNeXt models:

Implementation:

  • Instantiate the ConvNeXt-Tiny model using the below-mentioned code:
tf.keras.applications.ConvNeXtTiny(
model_name="convnext_tiny",
include_top=True,
include_preprocessing=True,
weights="imagenet",
input_tensor=None,
input_shape=None,
pooling=None,
classes=1000,
classifier_activation="softmax",
)

The above-mentioned code is for ConvNeXt-Tiny implementation, keras offers a similar API of the other EfficientNet architecture (ConvNeXt-Small, ConvNeXt-Base, ConvNeXt-Large, ConvNeXt-XLarge) implementation, for more details refer to this documentation.

FOLLOW US ON GOOGLE NEWS

Read original article here

Denial of responsibility! Techno Blender is an automatic aggregator of the all world’s media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials, please contact us by email – [email protected]. The content will be deleted within 24 hours.

Leave a comment