3D Generative Modeling with DeepSDF | by Cameron Wolfe | Jan, 2023

By Jessie Hobb On Jan 31, 2023

Simple neural networks can capture complex 3D geometries

Prior research in computer graphics and 3D computer vision has proposed numerous approaches for representing 3D shapes. Such methods are useful for:

storing memory-efficient representations of known shapes
generating new shapes
fixing/reconstructing shapes based on limited or noisy data

Beyond classical approaches, deep learning — or, more specifically, generative neural networks — can be used to represent 3D shapes. To do this, we can train a neural network to output a representation of a 3D shape, allowing representations for a variety of shapes to be indirectly stored within the weights of the neural network. Then, we can query this neural network to produce new shapes.

Within this post, we will study one of such methods, called DeepSDF [1], that uses a simple, feed-forward neural network to learn signed distance function (SDF) representations for a variety of 3D shapes. The basic idea is simple: instead of directly encoding a geometry (e.g., via a mesh), we train a generative neural network to output this geometry. Then, we can perform inference to (i) obtain the direct encoding of a (potentially new) 3D shape or (ii) fix/reconstruct a 3D shape from noisy data.

Before diving into how DeepSDF works, there are a few background concepts that we will need to understand. First, we’ll talk a bit about how 3D shapes are usually represented, as well as how a signed distance function (SDF) can be used to represent a 3D shape. Then, we’ll talk about feed-forward neural networks, an incredibly simple deep learning architecture that is used heavily by research in 3D modeling of shapes.

When considering how to store a 3D shape in a computer, we have three options: a point cloud, mesh, or voxels. Each of these representations have different benefits and limitations, but they are all valid methods of directly representing a 3D shape. Let’s get a basic idea of how they work.

Point cloud. Point clouds are pretty easy to understand. As we might infer from the name, they just store a group of points with [x, y, z] coordinates in space, and these points are used to represent an underlying geometry. Point clouds are useful because they closely match the type of data we would get from sensors like LiDAR or depth-sensing cameras. But, point clouds do not provide a watertight surface (i.e., a shape with one, closed surface).

Mesh. One 3D representation that can provide a watertight surface is a mesh. Meshes are 3D shape representations based upon collections of vertices, edges, and faces that describe an underlying shape. Put simply, a mesh is just a list of polygons (e.g., triangles) that, when stitched together, form a 3D geometry.

Voxel-based representation. Voxels are just pixels with volume. Instead of a pixel in a 2D image, we have a voxel (i.e., a cube) in 3D space. To represent a 3D shape with voxels, we can:

Divide a section of 3D space into discrete voxels
Identify whether each voxel is filled or not

Using this simple technique, we can construct a voxel-based 3D object. To get a more accurate representation, we can just increase the number of voxels that we use, forming a finer discretization of 3D space. See below for an illustration of the difference between point clouds, meshes, and voxels.

Directly storing a 3D shape using a point cloud, mesh, or voxels requires a lot of memory. Instead, we will usually want to store an indirect representation of the shape that’s more efficient. One approach for this would be to use a signed distance function (SDF).

Given a spatial [x, y, z] point as input, SDFs will output the distance from that point to the nearest surface of the underlying object being represented. The sign of the SDF’s output indicates whether that spatial point is inside (negative) or outside (positive) of the object’s surface. See the equation below.

We can identify the surface of a 3D object by finding the locations at which the SDF is equal to zero, indicating that a given point is at the boundary of the object. After finding this surface using the SDF, we can generate a mesh by using algorithms like Marching Cubes.

Why is this useful? At a high level, SDFs allow us to store a function instead of a direct representation of the 3D shape. This function is likely more efficient to store, and we can use is to recover a mesh representation anyways!

Many highly-accurate methods for modeling 3D shapes are based upon feed-forward network architectures. Such an architecture takes a vector as input and applies the same two transformations within each of the network’s layers:

Linear transformation
Non-linear activation function

Though the dimension of our input is fixed, two aspects of the network architecture are free for us to choose: the hidden dimension and the number of layers. Variables like this that we, as practitioners, are expected to set are called hyperparameters. The correct setting of these hyperparameters depends upon the problem and/or application we are trying to solve.

The code. There is not much complexity to feed-forward networks. We can implement them easily in PyTorch as shown below.

Simple neural networks can capture complex 3D geometries

Prior research in computer graphics and 3D computer vision has proposed numerous approaches for representing 3D shapes. Such methods are useful for:

storing memory-efficient representations of known shapes
generating new shapes
fixing/reconstructing shapes based on limited or noisy data

Voxel-based representation. Voxels are just pixels with volume. Instead of a pixel in a 2D image, we have a voxel (i.e., a cube) in 3D space. To represent a 3D shape with voxels, we can:

Divide a section of 3D space into discrete voxels
Identify whether each voxel is filled or not

Linear transformation
Non-linear activation function

The code. There is not much complexity to feed-forward networks. We can implement them easily in PyTorch as shown below.

Read original article here

Denial of responsibility! Techno Blender is an automatic aggregator of the all world’s media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials, please contact us by email – [email protected]. The content will be deleted within 24 hours.

3D Generative Modeling with DeepSDF | by Cameron Wolfe | Jan, 2023

Simple neural networks can capture complex 3D geometries

Closing remarks

Bibliography

Simple neural networks can capture complex 3D geometries

Closing remarks

Bibliography