Geometric Deep Learning on Groups | by Jason McEwen | Mar, 2023


Ideally geometric deep learning techniques on groups would encode equivariance to group transformations, to provide well-behaved representation spaces and excellent performance, while also being computationally efficient. However, no single approach provides both of these desirable properties. Continuous approaches offer excellent equivariance but with a very large computational cost. Discrete approaches are typically relatively computationally efficient but sacrifice equivariance. We point towards future techniques that achieve the best of both worlds.

Photo by Serg Antonov on Unsplash

Deep learning on groups is a rapidly growing area of geometric deep learning (see our recent TDS article on A Brief Introduction to Geometric Deep Learning). Groups include homogenous spaces with global symmetries, with the archetypical example being the sphere.

Practical applications on geometric deep learning on groups are prevalent, particularly for the sphere. For example, spherical data arise in myrad applications, not only when data is acquired directly on the sphere (such as over the Earth or by 360° cameras that capture panoramic photos and videos), but also when considering spherical symmetries (such as in molecular chemistry or magnetic resonance imaging).

We need deep learning techniques on groups that are both highly effective and scalable to huge datasets of high-resolution data. In general this problem remains unsolved.

An example of spherical data. [Photo by NASA on Unsplash]

One of the reasons deep learning techniques have been so effective is due to the inductive biases encoded in modern architectures.

One particularly powerful inductive bias is to encode symmetries that the data are known to satisfy (as elaborated in our TDS article What Einstein Can Teach Us About Machine Learning). Convolutional neural networks (CNNs), for example, encode translational symmetry or, more precisely, translational equivariance, as illustrated in the diagram below.

llustration of translational equivariance. Given an image (top left), applying a convolutional kernel (𝒜) to obtain a feature map (top right) and then translating (𝒯) the feature map (bottom right) is equivalent to first translating the image (bottom left) and then applying the convolution kernel (bottom right). [Original figure created by authors.]

Encoding equivariance in deep learning architectures results in well-behaved feature spaces where learning can be performed very effectively.

For geometric deep learning on groups we would therefore like to encode equivariance to various group transformations, which typically results in very good performance. However, in the general group setting this becomes highly computational demanding — prohibitively so in many cases.

How to encode equivariance in deep learning architectures on groups in a computationally scalable manner is an active area of research.

The notion of convolution, which is responsible for the huge success of CNN architectures for planar images, naturally encodes equivariance and can be generalised to the group setting.

The group convolution of a signal (i.e. data, feature map) f defined over the group, with a filter 𝝭, is given by

where g is an element of the group G and dµ(u) is the (Haar) measure of integration. The above expression is entirely analogous to convolution in the more common planar setting. We apply a transformation to the filter (a translation for planar CNNs), take the product with the signal of interest, and then sum, i.e. integrate.

On the sphere we consider transformations given by 3D rotations and so the convolution of a signal on the sphere reads

where R denotes a rotation and ω spherical coordinates.

Once a convolution on the group is defined, we can then construct a CNN on the group in a manner analogous to standard planar CNNs. That is, by composing convolutions and pointwise non-linear activations (also with pooling and normalisation layers, appropriately constructed on the group).

The question then remains: how do we compute the group convolution in practice?

On one hand, we’d like the implementation to accurately capture the equivariance properties on the convolution. While on the other hand, we’d like the implementation to be highly computationally efficient. As we will see, existing approaches typically capture one of these requirements but not both simultaneously.

Existing approaches can be broadly categorised in discrete and continuous approaches.

Discrete approaches work with a discrete version of the data, typically either pixels or a graph representation, which can often be highly computationally efficient. However, in general regular discretizations do not exist.

Taking the sphere as an example, it is well known that a regular discretization of the sphere does not exist. Consequently, there is no way to discretise the sphere in a manner that is invariant to rotations, as illustrated in the diagram below.

Rotating a set of pixels on the sphere results in a set on pixels that cannot be overlayed with the existing set. This is true for all samplings of the sphere. [Original figure created by authors.]

Capturing strict equivariance with operations defined directly on the discretized space is simply not possible.

Discrete approaches therefore over favourable computational performance but at the cost of equivariance.

As an alternative to the discrete approaches discussed above, a continuous representation of the underlying signal can also be considered.

Functions on the sphere can be represented by an expansion in terms of spherical harmonics (illustrated below). For a bandlimited signal, it is posible to capture all of the signal’s information content in a finite set of samples, from which spherical harmonic coefficients can be computed exactly [1]. This is the analog of the well-known Nyquist-Shannon sampling theorem extended to the sphere.

Spherical harmoinc functions. [Image sourced from Wikimedia Commons.]

Since the sphere is a compact manifold, its harmonic space is discrete. By working with a finite spherical harmonic space representation it is therefore possible to access the underlying continuous signal.

Various spherical CNN architecutres have been constructed where convolutions are computed through their harmonic representation [2–6]. By accessing the underlying continuous signal, these approaches achieve execellent equivariance properties. However, they involve repeatedly performing spherical harmonic transforms, which is computationally costly.

Continuous appraoches capture rotational equivariance acurately but are computationally demanding.

As we have seen above, a dichotomy exists between discrete and continous approaches, as illustrated in the diagram below. Ideally we’d like techniques that are both equivariant and computationally scalable.

However, continuous approaches offer equivariance but with a large computational cost. Discrete approaches, on the other hand, are typically relatively computationally efficient but sacrifice equivariance.

Dichotomy between continuous and discrete geometric deep learning techniques on groups. [Original figure created by authors.]

We desire geometric deep learning techniques on groups that provide equivariance (which typically translates to well-behaved representation spaces and excellent performance) and are also computationally scalable.

In our next post we will describe a new hyrbid discrete-continuous (DISCO) approach, recently accepted for ICLR, that achieves precisely these goals [7]. By keeping some components of the representation continuous we achieve excellent equivariance properties, while other components are discretized to provide highly efficient scalable computation.

[1] McEwen & Wiaux, A novel sampling theorem on the sphere, IEEE TSP (2012), arXiv:1110.6298

[2] Cohen, Geiger, Koehler, Welling, Spherical CNNs, ICLR (2018), arxiv:1801.10130.

[3] Esteves, Allen-Blanchette, Makadia, Daniilidis, Learning SO(3) Equivariant Representations with Spherical CNNs, ECCV (2018), arXiv:1711.06721.

[4] Kondor, Lin, Trivedi, Clebsch-Gordan Nets: a Fully Fourier Space Spherical Convolutional Neural Network, NeurIPS (2018), arXiv:1806.09231

[5] Cobb, Wallis, Mavor-Parker, Marignier, Price, d’Avezac, McEwen, Efficient Generalised Spherical CNNs, ICLR (2021), arXiv:2010.11661

[6] McEwen, Wallis, Mavor-Parker, Scattering Networks on the Sphere for Scalable and Rotationally Equivariant Spherical CNNs, ICLR (2022), arXiv:2102.02828

[7] Ocampo, Price, McEwen, Scalable and equivariant spherical CNNs by discrete-continuous (DISCO) convolutions, ICLR (2023), arXiv:2209.13603


Ideally geometric deep learning techniques on groups would encode equivariance to group transformations, to provide well-behaved representation spaces and excellent performance, while also being computationally efficient. However, no single approach provides both of these desirable properties. Continuous approaches offer excellent equivariance but with a very large computational cost. Discrete approaches are typically relatively computationally efficient but sacrifice equivariance. We point towards future techniques that achieve the best of both worlds.

Photo by Serg Antonov on Unsplash

Deep learning on groups is a rapidly growing area of geometric deep learning (see our recent TDS article on A Brief Introduction to Geometric Deep Learning). Groups include homogenous spaces with global symmetries, with the archetypical example being the sphere.

Practical applications on geometric deep learning on groups are prevalent, particularly for the sphere. For example, spherical data arise in myrad applications, not only when data is acquired directly on the sphere (such as over the Earth or by 360° cameras that capture panoramic photos and videos), but also when considering spherical symmetries (such as in molecular chemistry or magnetic resonance imaging).

We need deep learning techniques on groups that are both highly effective and scalable to huge datasets of high-resolution data. In general this problem remains unsolved.

An example of spherical data. [Photo by NASA on Unsplash]

One of the reasons deep learning techniques have been so effective is due to the inductive biases encoded in modern architectures.

One particularly powerful inductive bias is to encode symmetries that the data are known to satisfy (as elaborated in our TDS article What Einstein Can Teach Us About Machine Learning). Convolutional neural networks (CNNs), for example, encode translational symmetry or, more precisely, translational equivariance, as illustrated in the diagram below.

llustration of translational equivariance. Given an image (top left), applying a convolutional kernel (𝒜) to obtain a feature map (top right) and then translating (𝒯) the feature map (bottom right) is equivalent to first translating the image (bottom left) and then applying the convolution kernel (bottom right). [Original figure created by authors.]

Encoding equivariance in deep learning architectures results in well-behaved feature spaces where learning can be performed very effectively.

For geometric deep learning on groups we would therefore like to encode equivariance to various group transformations, which typically results in very good performance. However, in the general group setting this becomes highly computational demanding — prohibitively so in many cases.

How to encode equivariance in deep learning architectures on groups in a computationally scalable manner is an active area of research.

The notion of convolution, which is responsible for the huge success of CNN architectures for planar images, naturally encodes equivariance and can be generalised to the group setting.

The group convolution of a signal (i.e. data, feature map) f defined over the group, with a filter 𝝭, is given by

where g is an element of the group G and dµ(u) is the (Haar) measure of integration. The above expression is entirely analogous to convolution in the more common planar setting. We apply a transformation to the filter (a translation for planar CNNs), take the product with the signal of interest, and then sum, i.e. integrate.

On the sphere we consider transformations given by 3D rotations and so the convolution of a signal on the sphere reads

where R denotes a rotation and ω spherical coordinates.

Once a convolution on the group is defined, we can then construct a CNN on the group in a manner analogous to standard planar CNNs. That is, by composing convolutions and pointwise non-linear activations (also with pooling and normalisation layers, appropriately constructed on the group).

The question then remains: how do we compute the group convolution in practice?

On one hand, we’d like the implementation to accurately capture the equivariance properties on the convolution. While on the other hand, we’d like the implementation to be highly computationally efficient. As we will see, existing approaches typically capture one of these requirements but not both simultaneously.

Existing approaches can be broadly categorised in discrete and continuous approaches.

Discrete approaches work with a discrete version of the data, typically either pixels or a graph representation, which can often be highly computationally efficient. However, in general regular discretizations do not exist.

Taking the sphere as an example, it is well known that a regular discretization of the sphere does not exist. Consequently, there is no way to discretise the sphere in a manner that is invariant to rotations, as illustrated in the diagram below.

Rotating a set of pixels on the sphere results in a set on pixels that cannot be overlayed with the existing set. This is true for all samplings of the sphere. [Original figure created by authors.]

Capturing strict equivariance with operations defined directly on the discretized space is simply not possible.

Discrete approaches therefore over favourable computational performance but at the cost of equivariance.

As an alternative to the discrete approaches discussed above, a continuous representation of the underlying signal can also be considered.

Functions on the sphere can be represented by an expansion in terms of spherical harmonics (illustrated below). For a bandlimited signal, it is posible to capture all of the signal’s information content in a finite set of samples, from which spherical harmonic coefficients can be computed exactly [1]. This is the analog of the well-known Nyquist-Shannon sampling theorem extended to the sphere.

Spherical harmoinc functions. [Image sourced from Wikimedia Commons.]

Since the sphere is a compact manifold, its harmonic space is discrete. By working with a finite spherical harmonic space representation it is therefore possible to access the underlying continuous signal.

Various spherical CNN architecutres have been constructed where convolutions are computed through their harmonic representation [2–6]. By accessing the underlying continuous signal, these approaches achieve execellent equivariance properties. However, they involve repeatedly performing spherical harmonic transforms, which is computationally costly.

Continuous appraoches capture rotational equivariance acurately but are computationally demanding.

As we have seen above, a dichotomy exists between discrete and continous approaches, as illustrated in the diagram below. Ideally we’d like techniques that are both equivariant and computationally scalable.

However, continuous approaches offer equivariance but with a large computational cost. Discrete approaches, on the other hand, are typically relatively computationally efficient but sacrifice equivariance.

Dichotomy between continuous and discrete geometric deep learning techniques on groups. [Original figure created by authors.]

We desire geometric deep learning techniques on groups that provide equivariance (which typically translates to well-behaved representation spaces and excellent performance) and are also computationally scalable.

In our next post we will describe a new hyrbid discrete-continuous (DISCO) approach, recently accepted for ICLR, that achieves precisely these goals [7]. By keeping some components of the representation continuous we achieve excellent equivariance properties, while other components are discretized to provide highly efficient scalable computation.

[1] McEwen & Wiaux, A novel sampling theorem on the sphere, IEEE TSP (2012), arXiv:1110.6298

[2] Cohen, Geiger, Koehler, Welling, Spherical CNNs, ICLR (2018), arxiv:1801.10130.

[3] Esteves, Allen-Blanchette, Makadia, Daniilidis, Learning SO(3) Equivariant Representations with Spherical CNNs, ECCV (2018), arXiv:1711.06721.

[4] Kondor, Lin, Trivedi, Clebsch-Gordan Nets: a Fully Fourier Space Spherical Convolutional Neural Network, NeurIPS (2018), arXiv:1806.09231

[5] Cobb, Wallis, Mavor-Parker, Marignier, Price, d’Avezac, McEwen, Efficient Generalised Spherical CNNs, ICLR (2021), arXiv:2010.11661

[6] McEwen, Wallis, Mavor-Parker, Scattering Networks on the Sphere for Scalable and Rotationally Equivariant Spherical CNNs, ICLR (2022), arXiv:2102.02828

[7] Ocampo, Price, McEwen, Scalable and equivariant spherical CNNs by discrete-continuous (DISCO) convolutions, ICLR (2023), arXiv:2209.13603

FOLLOW US ON GOOGLE NEWS

Read original article here

Denial of responsibility! Techno Blender is an automatic aggregator of the all world’s media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials, please contact us by email – admin@technoblender.com. The content will be deleted within 24 hours.
DeepGeometricgroupsJasonlearningmachine learningMARMcEwenTech NewsTechnoblender
Comments (0)
Add Comment