Generate a 3D Mesh from an Image with Python | by Mattia Gatti | Oct, 2022
Combine Deep Learning with 3D data processing to generate a mesh
Generating a 3D mesh from a single 2D image seemed a very hard task some years ago. Nowadays, thanks to the advancement in Deep Learning, multiple monocular depth estimation models have been developed and they can provide a precise depth map from any image. Through this map, it’s possible to generate a mesh by performing surface reconstruction.
Introduction
Monocular depth estimation is the task of estimating the depth value (distance relative to the camera) of each pixel given a single (monocular) RGB image. The output of a monocular depth estimation model is a depth map, which is basically a matrix, where each element corresponds to the predicted depth of the associated pixel in the input image.
Points in a depth map can be seen as a collection of points with 3-axis coordinates. As the map is a matrix, each element has x and y components (its column and row). Whereas the z component is its stored value, which is the predicted depth in the point (x, y). In the field of 3D data processing, a list of (x, y, z) points is called a point cloud.
Starting from an unstructured point cloud it is possible to obtain a mesh. A mesh is a 3D object representation consisting of a collection of vertices and polygons. The most common type of mesh is the triangle mesh, which comprises a set of three-dimensional triangles that are connected by their common edges or vertices. In the literature, there exist a couple of methods to obtain a triangle mesh from a point cloud and the most popular are Alpha shape¹, Ball pivoting², and Poisson surface reconstruction³. These methods are known as surface reconstruction algorithms.
The procedure used in this guide to generate a mesh from an image is made up of three phases:
- Depth estimation— the depth map of the input image is generated using a monocular depth estimation model.
- Point cloud construction — the depth map is converted into a point cloud.
- Mesh generation—from the point cloud, a mesh is generated by using a surface reconstruction algorithm.
To follow the different steps of the procedure illustrated in the guide you need an image. In case you haven’t got one at your fingertips you can download this one:
1. Depth estimation
The monocular depth estimation model chosen for this guide is GLPN⁴. It is available on the Hugging Face Model Hub. Models can be retrieved from this hub by using the Hugging Face library Transformers.
To install Transformers latest version from PyPI use:
pip install transformers
The code above is used to estimate the depth of an input image:
To use GLPN, the Transformers library provides two classes: GLPNFeatureExtractor
, used to preprocess each input andGLPNForDepthEstimation
, i.e. the model class.
Because of its architecture, the model output size is:
Thus, image
is resized to get both height and width multiple numbers of 32. Otherwise, the output of the model will be smaller than the input. This is required because the point cloud will be painted using the image pixels and to do so input image and output depth map must have the same size.
As monocular depth estimation models struggle to get high-quality predictions near the borders, output
is center-cropped (row 33). To keep the same dimensions between input and output also image
is center-cropped (row 34).
These are some predictions:
2. Point cloud construction
For the 3D processing part of this guide, Open3d⁵ will be used. It’s probably the best Python library for this kind of task.
To install Open3d latest version from PyPI use:
pip install open3d
The following code converts the estimated depth map to an Open3D point cloud object:
An RGBD Image is simply a combination of an RGB image and its corresponding depth image. PinholeCameraIntrinsic class stores what is known as the intrinsic camera matrix. Through this matrix, Open3D can create a point cloud from an RGBD image with the correct spacing between the points. Keep the intrinsic parameters as they are. For more details, see the additional resources at the end of the guide.
To visualize the point cloud use:
o3d.visualization.draw_geometries([pcd])
3. Mesh generation
Among the various methods available in the literature for this task, the guide uses the Poisson surface reconstruction³ algorithm. This method has been chosen because it’s the one that usually gives better and smoother results.
This code generates the mesh from the point cloud obtained in the last step by using the Poisson algorithm:
First of all, the code performs the removal of outliers from the point cloud. A point cloud might contain noise and artefacts due to a variety of causes. In this scenario, the model might have predicted some depths which vary too much if compared to its neighbours.
The next step is the normal estimation. A normal is a vector (thus with a magnitude and a direction) perpendicular to a surface or an object and to process the Poisson algorithm they have to be estimated. For more details about these vectors, see the additional resources at the end of the guide.
Finally, the algorithm is executed. depth
value defines the detail level of the mesh. A higher depth value aside from increasing mesh quality increases also the output dimensions.
To visualize the mesh I advise you to download MeshLab because there are some 3D visualization programs that can’t render colors.
This is the final result:
As the final result changes according to depth
value, this is a comparison among different ones:
The algorithm using depth=5
led to a 375 KB mesh, depth=6
to 1.2 MB, depth=7
to 5 MB, depth=8
to 19 MB, depth=9
to 70MB, and by using depth=10
to 86 MB.
Conclusion
In spite of using a single image, the result is pretty good. With the help of some 3D editing, you can reach even better results. As this guide can’t completely cover all the details of 3D data processing, I advise you to read the other resources below to better understand all the aspects involved.
Additional resources:
Thanks for reading, I hope you have found this useful.
References
[1] H. Edelsbrunner, and E. P. Mücke, Three-dimensional Alpha Shapes (1994)
[2] F. Bernardini, J. Mittleman, H. Rushmeier, C. Silva, and G. Taubin, The ball-pivoting algorithm for surface reconstruction (1999)
[3] M. Kazhdan, M. Bolitho and H. Hoppe, Poisson Surface Reconstruction (2006)
[4] D. Kim, W. Ga, P. Ahn, D. Joo, S. Chun, and J. Kim, Global-Local Path Networks for Monocular Depth Estimation with Vertical CutDepth (2022)
[5] Q. Zhou, J. Park, and V. Koltun, Open3D: A Modern Library for 3D Data Processing (2018)
[6] N. Silberman, D. Hoiem, P. Kohli, and Rob Fergus, Indoor Segmentation and Support Inference from RGBD Images (2012)
Combine Deep Learning with 3D data processing to generate a mesh
Generating a 3D mesh from a single 2D image seemed a very hard task some years ago. Nowadays, thanks to the advancement in Deep Learning, multiple monocular depth estimation models have been developed and they can provide a precise depth map from any image. Through this map, it’s possible to generate a mesh by performing surface reconstruction.
Introduction
Monocular depth estimation is the task of estimating the depth value (distance relative to the camera) of each pixel given a single (monocular) RGB image. The output of a monocular depth estimation model is a depth map, which is basically a matrix, where each element corresponds to the predicted depth of the associated pixel in the input image.
Points in a depth map can be seen as a collection of points with 3-axis coordinates. As the map is a matrix, each element has x and y components (its column and row). Whereas the z component is its stored value, which is the predicted depth in the point (x, y). In the field of 3D data processing, a list of (x, y, z) points is called a point cloud.
Starting from an unstructured point cloud it is possible to obtain a mesh. A mesh is a 3D object representation consisting of a collection of vertices and polygons. The most common type of mesh is the triangle mesh, which comprises a set of three-dimensional triangles that are connected by their common edges or vertices. In the literature, there exist a couple of methods to obtain a triangle mesh from a point cloud and the most popular are Alpha shape¹, Ball pivoting², and Poisson surface reconstruction³. These methods are known as surface reconstruction algorithms.
The procedure used in this guide to generate a mesh from an image is made up of three phases:
- Depth estimation— the depth map of the input image is generated using a monocular depth estimation model.
- Point cloud construction — the depth map is converted into a point cloud.
- Mesh generation—from the point cloud, a mesh is generated by using a surface reconstruction algorithm.
To follow the different steps of the procedure illustrated in the guide you need an image. In case you haven’t got one at your fingertips you can download this one:
1. Depth estimation
The monocular depth estimation model chosen for this guide is GLPN⁴. It is available on the Hugging Face Model Hub. Models can be retrieved from this hub by using the Hugging Face library Transformers.
To install Transformers latest version from PyPI use:
pip install transformers
The code above is used to estimate the depth of an input image:
To use GLPN, the Transformers library provides two classes: GLPNFeatureExtractor
, used to preprocess each input andGLPNForDepthEstimation
, i.e. the model class.
Because of its architecture, the model output size is:
Thus, image
is resized to get both height and width multiple numbers of 32. Otherwise, the output of the model will be smaller than the input. This is required because the point cloud will be painted using the image pixels and to do so input image and output depth map must have the same size.
As monocular depth estimation models struggle to get high-quality predictions near the borders, output
is center-cropped (row 33). To keep the same dimensions between input and output also image
is center-cropped (row 34).
These are some predictions:
2. Point cloud construction
For the 3D processing part of this guide, Open3d⁵ will be used. It’s probably the best Python library for this kind of task.
To install Open3d latest version from PyPI use:
pip install open3d
The following code converts the estimated depth map to an Open3D point cloud object:
An RGBD Image is simply a combination of an RGB image and its corresponding depth image. PinholeCameraIntrinsic class stores what is known as the intrinsic camera matrix. Through this matrix, Open3D can create a point cloud from an RGBD image with the correct spacing between the points. Keep the intrinsic parameters as they are. For more details, see the additional resources at the end of the guide.
To visualize the point cloud use:
o3d.visualization.draw_geometries([pcd])
3. Mesh generation
Among the various methods available in the literature for this task, the guide uses the Poisson surface reconstruction³ algorithm. This method has been chosen because it’s the one that usually gives better and smoother results.
This code generates the mesh from the point cloud obtained in the last step by using the Poisson algorithm:
First of all, the code performs the removal of outliers from the point cloud. A point cloud might contain noise and artefacts due to a variety of causes. In this scenario, the model might have predicted some depths which vary too much if compared to its neighbours.
The next step is the normal estimation. A normal is a vector (thus with a magnitude and a direction) perpendicular to a surface or an object and to process the Poisson algorithm they have to be estimated. For more details about these vectors, see the additional resources at the end of the guide.
Finally, the algorithm is executed. depth
value defines the detail level of the mesh. A higher depth value aside from increasing mesh quality increases also the output dimensions.
To visualize the mesh I advise you to download MeshLab because there are some 3D visualization programs that can’t render colors.
This is the final result:
As the final result changes according to depth
value, this is a comparison among different ones:
The algorithm using depth=5
led to a 375 KB mesh, depth=6
to 1.2 MB, depth=7
to 5 MB, depth=8
to 19 MB, depth=9
to 70MB, and by using depth=10
to 86 MB.
Conclusion
In spite of using a single image, the result is pretty good. With the help of some 3D editing, you can reach even better results. As this guide can’t completely cover all the details of 3D data processing, I advise you to read the other resources below to better understand all the aspects involved.
Additional resources:
Thanks for reading, I hope you have found this useful.
References
[1] H. Edelsbrunner, and E. P. Mücke, Three-dimensional Alpha Shapes (1994)
[2] F. Bernardini, J. Mittleman, H. Rushmeier, C. Silva, and G. Taubin, The ball-pivoting algorithm for surface reconstruction (1999)
[3] M. Kazhdan, M. Bolitho and H. Hoppe, Poisson Surface Reconstruction (2006)
[4] D. Kim, W. Ga, P. Ahn, D. Joo, S. Chun, and J. Kim, Global-Local Path Networks for Monocular Depth Estimation with Vertical CutDepth (2022)
[5] Q. Zhou, J. Park, and V. Koltun, Open3D: A Modern Library for 3D Data Processing (2018)
[6] N. Silberman, D. Hoiem, P. Kohli, and Rob Fergus, Indoor Segmentation and Support Inference from RGBD Images (2012)