Techno Blender
Digitally Yours.

Generate a 3D Mesh from an Image with Python | by Mattia Gatti | Oct, 2022

0 60


Combine Deep Learning with 3D data processing to generate a mesh

Photo by Alvaro Pinot on Unsplash

Generating a 3D mesh from a single 2D image seemed a very hard task some years ago. Nowadays, thanks to the advancement in Deep Learning, multiple monocular depth estimation models have been developed and they can provide a precise depth map from any image. Through this map, it’s possible to generate a mesh by performing surface reconstruction.

Introduction

Monocular depth estimation is the task of estimating the depth value (distance relative to the camera) of each pixel given a single (monocular) RGB image. The output of a monocular depth estimation model is a depth map, which is basically a matrix, where each element corresponds to the predicted depth of the associated pixel in the input image.

A depth map. Image by the author.

Points in a depth map can be seen as a collection of points with 3-axis coordinates. As the map is a matrix, each element has x and y components (its column and row). Whereas the z component is its stored value, which is the predicted depth in the point (x, y). In the field of 3D data processing, a list of (x, y, z) points is called a point cloud.

A point cloud. Original file by Open3D.

Starting from an unstructured point cloud it is possible to obtain a mesh. A mesh is a 3D object representation consisting of a collection of vertices and polygons. The most common type of mesh is the triangle mesh, which comprises a set of three-dimensional triangles that are connected by their common edges or vertices. In the literature, there exist a couple of methods to obtain a triangle mesh from a point cloud and the most popular are Alpha shape¹, Ball pivoting², and Poisson surface reconstruction³. These methods are known as surface reconstruction algorithms.

A triangle mesh. Original file by Open3d.

The procedure used in this guide to generate a mesh from an image is made up of three phases:

  1. Depth estimation— the depth map of the input image is generated using a monocular depth estimation model.
  2. Point cloud construction — the depth map is converted into a point cloud.
  3. Mesh generation—from the point cloud, a mesh is generated by using a surface reconstruction algorithm.

To follow the different steps of the procedure illustrated in the guide you need an image. In case you haven’t got one at your fingertips you can download this one:

A Bedroom. Image from NYU-Depth V2.

1. Depth estimation

The monocular depth estimation model chosen for this guide is GLPN⁴. It is available on the Hugging Face Model Hub. Models can be retrieved from this hub by using the Hugging Face library Transformers.

To install Transformers latest version from PyPI use:

pip install transformers

The code above is used to estimate the depth of an input image:

To use GLPN, the Transformers library provides two classes: GLPNFeatureExtractor , used to preprocess each input andGLPNForDepthEstimation, i.e. the model class.

Because of its architecture, the model output size is:

Output size. Image generated by using CodeCogs.

Thus, image is resized to get both height and width multiple numbers of 32. Otherwise, the output of the model will be smaller than the input. This is required because the point cloud will be painted using the image pixels and to do so input image and output depth map must have the same size.

As monocular depth estimation models struggle to get high-quality predictions near the borders, outputis center-cropped (row 33). To keep the same dimensions between input and output also imageis center-cropped (row 34).

These are some predictions:

Depth prediction of a bedroom. Input image from NYU-Depth V2.
Depth prediction of a play room. Input image from NYU-Depth V2.
Depth prediction of an office. Input image from NYU-Depth V2.

2. Point cloud construction

For the 3D processing part of this guide, Open3d⁵ will be used. It’s probably the best Python library for this kind of task.

To install Open3d latest version from PyPI use:

pip install open3d

The following code converts the estimated depth map to an Open3D point cloud object:

An RGBD Image is simply a combination of an RGB image and its corresponding depth image. PinholeCameraIntrinsic class stores what is known as the intrinsic camera matrix. Through this matrix, Open3D can create a point cloud from an RGBD image with the correct spacing between the points. Keep the intrinsic parameters as they are. For more details, see the additional resources at the end of the guide.

To visualize the point cloud use:

o3d.visualization.draw_geometries([pcd])

3. Mesh generation

Among the various methods available in the literature for this task, the guide uses the Poisson surface reconstruction³ algorithm. This method has been chosen because it’s the one that usually gives better and smoother results.

This code generates the mesh from the point cloud obtained in the last step by using the Poisson algorithm:

First of all, the code performs the removal of outliers from the point cloud. A point cloud might contain noise and artefacts due to a variety of causes. In this scenario, the model might have predicted some depths which vary too much if compared to its neighbours.

The next step is the normal estimation. A normal is a vector (thus with a magnitude and a direction) perpendicular to a surface or an object and to process the Poisson algorithm they have to be estimated. For more details about these vectors, see the additional resources at the end of the guide.

Finally, the algorithm is executed. depth value defines the detail level of the mesh. A higher depth value aside from increasing mesh quality increases also the output dimensions.

To visualize the mesh I advise you to download MeshLab because there are some 3D visualization programs that can’t render colors.

This is the final result:

The generated mesh. Image by the author.
The generated mesh (from another angle). Image by the author.

As the final result changes according to depth value, this is a comparison among different ones:

Comparison between different depth values. Image by the author.

The algorithm using depth=5led to a 375 KB mesh, depth=6to 1.2 MB, depth=7to 5 MB, depth=8to 19 MB, depth=9to 70MB, and by using depth=10to 86 MB.

Conclusion

In spite of using a single image, the result is pretty good. With the help of some 3D editing, you can reach even better results. As this guide can’t completely cover all the details of 3D data processing, I advise you to read the other resources below to better understand all the aspects involved.

Additional resources:

Thanks for reading, I hope you have found this useful.

References

[1] H. Edelsbrunner, and E. P. Mücke, Three-dimensional Alpha Shapes (1994)

[2] F. Bernardini, J. Mittleman, H. Rushmeier, C. Silva, and G. Taubin, The ball-pivoting algorithm for surface reconstruction (1999)

[3] M. Kazhdan, M. Bolitho and H. Hoppe, Poisson Surface Reconstruction (2006)

[4] D. Kim, W. Ga, P. Ahn, D. Joo, S. Chun, and J. Kim, Global-Local Path Networks for Monocular Depth Estimation with Vertical CutDepth (2022)

[5] Q. Zhou, J. Park, and V. Koltun, Open3D: A Modern Library for 3D Data Processing (2018)

[6] N. Silberman, D. Hoiem, P. Kohli, and Rob Fergus, Indoor Segmentation and Support Inference from RGBD Images (2012)


Combine Deep Learning with 3D data processing to generate a mesh

Photo by Alvaro Pinot on Unsplash

Generating a 3D mesh from a single 2D image seemed a very hard task some years ago. Nowadays, thanks to the advancement in Deep Learning, multiple monocular depth estimation models have been developed and they can provide a precise depth map from any image. Through this map, it’s possible to generate a mesh by performing surface reconstruction.

Introduction

Monocular depth estimation is the task of estimating the depth value (distance relative to the camera) of each pixel given a single (monocular) RGB image. The output of a monocular depth estimation model is a depth map, which is basically a matrix, where each element corresponds to the predicted depth of the associated pixel in the input image.

A depth map. Image by the author.

Points in a depth map can be seen as a collection of points with 3-axis coordinates. As the map is a matrix, each element has x and y components (its column and row). Whereas the z component is its stored value, which is the predicted depth in the point (x, y). In the field of 3D data processing, a list of (x, y, z) points is called a point cloud.

A point cloud. Original file by Open3D.

Starting from an unstructured point cloud it is possible to obtain a mesh. A mesh is a 3D object representation consisting of a collection of vertices and polygons. The most common type of mesh is the triangle mesh, which comprises a set of three-dimensional triangles that are connected by their common edges or vertices. In the literature, there exist a couple of methods to obtain a triangle mesh from a point cloud and the most popular are Alpha shape¹, Ball pivoting², and Poisson surface reconstruction³. These methods are known as surface reconstruction algorithms.

A triangle mesh. Original file by Open3d.

The procedure used in this guide to generate a mesh from an image is made up of three phases:

  1. Depth estimation— the depth map of the input image is generated using a monocular depth estimation model.
  2. Point cloud construction — the depth map is converted into a point cloud.
  3. Mesh generation—from the point cloud, a mesh is generated by using a surface reconstruction algorithm.

To follow the different steps of the procedure illustrated in the guide you need an image. In case you haven’t got one at your fingertips you can download this one:

A Bedroom. Image from NYU-Depth V2.

1. Depth estimation

The monocular depth estimation model chosen for this guide is GLPN⁴. It is available on the Hugging Face Model Hub. Models can be retrieved from this hub by using the Hugging Face library Transformers.

To install Transformers latest version from PyPI use:

pip install transformers

The code above is used to estimate the depth of an input image:

To use GLPN, the Transformers library provides two classes: GLPNFeatureExtractor , used to preprocess each input andGLPNForDepthEstimation, i.e. the model class.

Because of its architecture, the model output size is:

Output size. Image generated by using CodeCogs.

Thus, image is resized to get both height and width multiple numbers of 32. Otherwise, the output of the model will be smaller than the input. This is required because the point cloud will be painted using the image pixels and to do so input image and output depth map must have the same size.

As monocular depth estimation models struggle to get high-quality predictions near the borders, outputis center-cropped (row 33). To keep the same dimensions between input and output also imageis center-cropped (row 34).

These are some predictions:

Depth prediction of a bedroom. Input image from NYU-Depth V2.
Depth prediction of a play room. Input image from NYU-Depth V2.
Depth prediction of an office. Input image from NYU-Depth V2.

2. Point cloud construction

For the 3D processing part of this guide, Open3d⁵ will be used. It’s probably the best Python library for this kind of task.

To install Open3d latest version from PyPI use:

pip install open3d

The following code converts the estimated depth map to an Open3D point cloud object:

An RGBD Image is simply a combination of an RGB image and its corresponding depth image. PinholeCameraIntrinsic class stores what is known as the intrinsic camera matrix. Through this matrix, Open3D can create a point cloud from an RGBD image with the correct spacing between the points. Keep the intrinsic parameters as they are. For more details, see the additional resources at the end of the guide.

To visualize the point cloud use:

o3d.visualization.draw_geometries([pcd])

3. Mesh generation

Among the various methods available in the literature for this task, the guide uses the Poisson surface reconstruction³ algorithm. This method has been chosen because it’s the one that usually gives better and smoother results.

This code generates the mesh from the point cloud obtained in the last step by using the Poisson algorithm:

First of all, the code performs the removal of outliers from the point cloud. A point cloud might contain noise and artefacts due to a variety of causes. In this scenario, the model might have predicted some depths which vary too much if compared to its neighbours.

The next step is the normal estimation. A normal is a vector (thus with a magnitude and a direction) perpendicular to a surface or an object and to process the Poisson algorithm they have to be estimated. For more details about these vectors, see the additional resources at the end of the guide.

Finally, the algorithm is executed. depth value defines the detail level of the mesh. A higher depth value aside from increasing mesh quality increases also the output dimensions.

To visualize the mesh I advise you to download MeshLab because there are some 3D visualization programs that can’t render colors.

This is the final result:

The generated mesh. Image by the author.
The generated mesh (from another angle). Image by the author.

As the final result changes according to depth value, this is a comparison among different ones:

Comparison between different depth values. Image by the author.

The algorithm using depth=5led to a 375 KB mesh, depth=6to 1.2 MB, depth=7to 5 MB, depth=8to 19 MB, depth=9to 70MB, and by using depth=10to 86 MB.

Conclusion

In spite of using a single image, the result is pretty good. With the help of some 3D editing, you can reach even better results. As this guide can’t completely cover all the details of 3D data processing, I advise you to read the other resources below to better understand all the aspects involved.

Additional resources:

Thanks for reading, I hope you have found this useful.

References

[1] H. Edelsbrunner, and E. P. Mücke, Three-dimensional Alpha Shapes (1994)

[2] F. Bernardini, J. Mittleman, H. Rushmeier, C. Silva, and G. Taubin, The ball-pivoting algorithm for surface reconstruction (1999)

[3] M. Kazhdan, M. Bolitho and H. Hoppe, Poisson Surface Reconstruction (2006)

[4] D. Kim, W. Ga, P. Ahn, D. Joo, S. Chun, and J. Kim, Global-Local Path Networks for Monocular Depth Estimation with Vertical CutDepth (2022)

[5] Q. Zhou, J. Park, and V. Koltun, Open3D: A Modern Library for 3D Data Processing (2018)

[6] N. Silberman, D. Hoiem, P. Kohli, and Rob Fergus, Indoor Segmentation and Support Inference from RGBD Images (2012)

FOLLOW US ON GOOGLE NEWS

Read original article here

Denial of responsibility! Techno Blender is an automatic aggregator of the all world’s media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials, please contact us by email – [email protected]. The content will be deleted within 24 hours.

Leave a comment