iMAP: Modeling 3D Scenes in Real-Time | by Cameron R. Wolfe | May, 2023

By Jessie Hobb On May 13, 2023

Learning 3D environments with handheld RGB-D cameras

So far, we have only seen offline approaches for modeling 3D scenes (e.g., NeRF, SRNs, DeepSDF [2, 3, 4]). Despite their impressive performance, these approaches require days, or even weeks, of computation time for the underlying neural networks to be trained. For example, NeRFs are trained for nearly two days for just representing a single scene. Plus, using the the neural net to evaluate a new scene viewpoint can be quite expensive too! With this in mind, we might wonder whether it’s possible to learn a scene representation a bit faster than this.

This question was explored in [1] with the proposal of iMAP, a real-time system for representing scenes and localizing (i.e., tracking the pose of) devices in the scene. To understand what this means, consider a camera that is moving through a scene and capturing the surrounding environment. The task of iMAP is to (i) take in this data, (ii) build a 3D representation of the scene being observed, and (iii) infer the location and orientation of the camera (i.e., the device) as it captures the scene!

iMAP adopts an approach that is quite similar to NeRF [2] with a few differences:

It is based upon RGB-D data.
A streaming setup is assumed.

As such, the model receives both depth and color information as input. Additionally, the learning process begins from a completely random initialization, and iMAP must learn from new, incoming RGB-D images in real-time. Given this setup, iMAP is expected to (i) model the scene and (ii) predict the pose of the RGB-D camera for each incoming image (i.e., prior methods assume pose information as an input!). Despite this difficult training setup, iMAP can learn 3D representations of entire rooms in real-time!

why is this paper important? This post is part of my series on deep learning for 3D shapes and scenes. This area was recently revolutionized by the proposal of NeRF [2]. With a NeRF representation, we can produce an arbitrary number of synthetic viewpoints of a scene or even generate 3D representations of relevant objects; see below.

Learning 3D environments with handheld RGB-D cameras

iMAP adopts an approach that is quite similar to NeRF [2] with a few differences:

It is based upon RGB-D data.
A streaming setup is assumed.

Read original article here

Denial of responsibility! Techno Blender is an automatic aggregator of the all world’s media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials, please contact us by email – [email protected]. The content will be deleted within 24 hours.