CALVIN — a neural network that can learn to plan and navigate unknown environments | by Shu Ishida | Jun, 2022


Collision Avoidance Long-term Value Iteration Network

Summary: CALVIN is a neural network that can plan, explore and navigate in novel 3D environments. It learns tasks such as solving mazes, by just learning fromvg expert demonstrations. Our work builds upon Value Iteration Networks (VIN) [1], a type of recurrent convolutional neural network. While VINs only work well in fully-known environments, CALVIN works even in unknown environments where the agent has to explore the environment in order to find the target.

The three environments CALVIN has been tested in (image by author)

In this article I would like to give a high level overview of a paper I published recently to CVPR 2022 (Conference on Computer Vision and Pattern Recognition). The motivation for this research was to come up with a more robust neural network architecture that can learn to plan, inspired by the work on Value Iteration Networks [1].

Code: https://github.com/shuishida/calvin

The problem we address is visual navigation from demonstrations. A robotic agent must learn how to navigate, given a fixed amount of expert trajectories of RGB-D images and the actions taken. While it is easy to plan with a top-down map that defines what are obstacles and targets, it is more challenging if the agent has to learn the nature of obstacles and targets from the RGB-D images.

A sequence of images and actions that the agent sees as expert demonstrations (image by author)

Another important aspect of navigation is exploration. Our agent starts without any knowledge about the new environment, so it has to build a map of the environment as it navigates, and learn to explore areas that are most likely to lead to the target.

The agent learns to predict rewards that best explains expert demonstrations (image by author)

For the agent to be able to navigate in environments it hasn’t been trained on, it has to learn some general knowledge applicable across all environments. In particular, we focus on learning a shared transition model and reward model that best explain expert demonstrations, which can then be applied to new setups.

The agent learns motion dynamics that are reusable across all environments (image by author)

Our model consists of two parts — a learnt mapping component which we call Lattice PointNet, that aggregates past observations into a ground-projected map of embeddings, and CALVIN, which is a differentiable planner which models value iteration. Unlike more common approaches in reinforcement learning, where the agent sees an image and tries to reactively predict the best action, by having a proper spatial representation learnt by the Lattice PointNet and using CALVIN as a planning network, our agent is able to explore and navigate taking into account past observations in a spatially meaningful way.

Overview of the model architecture (image by author)

CALVIN is an improved version of Value Iteration Networks or VIN in short, which use recurrent convolution as a form of value iteration for spatial tasks. It learns a reward map and a convolutional kernel that, applied following the value iteration update equation, produces a Q-value map, which is an estimate of the future rewards the agent can obtain. Once the value map is computed, the agent can take the action that yields the highest value to maximise its chances.

While the VIN is a simple architecture, it has several flaws, most notably that it doesn’t strictly learn value iteration in practice. As you can see in the figure below, the value map produced by the VIN is not what we would expect from value iteration, whereas our model CALVIN learns to produce a value map that is almost identical to the theoretical solution. We identified the mismatch to be because the VIN is not constrained enough to penalise obstacles, hence making suboptimal decisions such as repeatedly exploring dead-ends.

Comparison of value maps produced by the VIN and CALVIN. The VIN produces non-interpretable, brittle value maps. (image by author)

CALVIN, on the other hand, explicitly learns valid and invalid transitions. It decomposes the transition model into a shared agent motion model and an action availability model. CALVIN uses the action availability model to penalise the invalid actions, and to prevent values from propagating from unreachable states. In addition to these constraints on available actions, we also improved the training loss so that the model can leverage training signals across the entire trajectory, instead of just the current state.

Model diagram of CALVIN (image by author)

We performed experiments, specifically on exploration of novel unknown environments, in three domains: a grid maze environment, MiniWorld [2], and the Active Vision Dataset [3]. CALVIN achieved a more robust navigation, even in unknown environments, demonstrating explorative behaviour which the VIN lacks.

In our grid maze setup, the agent can only view the maze locally. The agent can choose to either move forward, rotate left, right, or trigger done. We can see that the agent predicts higher values for places the agent hasn’t explored yet, and a high reward for the target location when the agent sees it.

CALVIN in a Grid Maze Environment

Next, we ran a similar experiment in a 3D maze environment called MiniWorld, but this time using RGB-D image sequences from the agent’s perspective rather than a top-down view. While the agent navigates, it builds up a map of embeddings with the Lattice PointNet, which is then fed into CALVIN. Here too, the agent has learned to assign lower values to walls and higher values to unexplored locations. We can observe that the agent manages to backtrack upon hitting a dead-end, and replan towards other unexplored cells. When the agent sees the target, it assigns a high reward to cells near the target.

CALVIN in MiniWorld

Finally, we tested the agent using the Active Vision Dataset, which is a collection of real-world images obtained by a robotic platform, from which we can create trajectories. For this task, we used pre-trained ResNet embeddings and fed them into the Lattice PointNet. The agent was trained to navigate towards a soda bottle in the room.

CALVIN in the Active Vision Dataset environment

CALVIN is able to explore and navigate unknown environments more robustly compared to other differentiable planners. This improvement from the VIN comes from explicitly modelling action availability, used to penalise invalid actions, together with an improved training loss that uses trajectory reweighting. We also introduced a Lattice PointNet backbone that efficiently fuses past observations in a spatially consistent way.

[1] Tamar et al., “Value Iteration Networks”, NeurIPS 2016.

[2] M. Chevalier-Boisvert, https://github.com/maximecb/gym-miniworld, 2018.

[3] Ammirato et al. “A dataset for developing and benchmarking active vision”, ICRA 2017.


Collision Avoidance Long-term Value Iteration Network

Summary: CALVIN is a neural network that can plan, explore and navigate in novel 3D environments. It learns tasks such as solving mazes, by just learning fromvg expert demonstrations. Our work builds upon Value Iteration Networks (VIN) [1], a type of recurrent convolutional neural network. While VINs only work well in fully-known environments, CALVIN works even in unknown environments where the agent has to explore the environment in order to find the target.

The three environments CALVIN has been tested in (image by author)

In this article I would like to give a high level overview of a paper I published recently to CVPR 2022 (Conference on Computer Vision and Pattern Recognition). The motivation for this research was to come up with a more robust neural network architecture that can learn to plan, inspired by the work on Value Iteration Networks [1].

Code: https://github.com/shuishida/calvin

The problem we address is visual navigation from demonstrations. A robotic agent must learn how to navigate, given a fixed amount of expert trajectories of RGB-D images and the actions taken. While it is easy to plan with a top-down map that defines what are obstacles and targets, it is more challenging if the agent has to learn the nature of obstacles and targets from the RGB-D images.

A sequence of images and actions that the agent sees as expert demonstrations (image by author)

Another important aspect of navigation is exploration. Our agent starts without any knowledge about the new environment, so it has to build a map of the environment as it navigates, and learn to explore areas that are most likely to lead to the target.

The agent learns to predict rewards that best explains expert demonstrations (image by author)

For the agent to be able to navigate in environments it hasn’t been trained on, it has to learn some general knowledge applicable across all environments. In particular, we focus on learning a shared transition model and reward model that best explain expert demonstrations, which can then be applied to new setups.

The agent learns motion dynamics that are reusable across all environments (image by author)

Our model consists of two parts — a learnt mapping component which we call Lattice PointNet, that aggregates past observations into a ground-projected map of embeddings, and CALVIN, which is a differentiable planner which models value iteration. Unlike more common approaches in reinforcement learning, where the agent sees an image and tries to reactively predict the best action, by having a proper spatial representation learnt by the Lattice PointNet and using CALVIN as a planning network, our agent is able to explore and navigate taking into account past observations in a spatially meaningful way.

Overview of the model architecture (image by author)

CALVIN is an improved version of Value Iteration Networks or VIN in short, which use recurrent convolution as a form of value iteration for spatial tasks. It learns a reward map and a convolutional kernel that, applied following the value iteration update equation, produces a Q-value map, which is an estimate of the future rewards the agent can obtain. Once the value map is computed, the agent can take the action that yields the highest value to maximise its chances.

While the VIN is a simple architecture, it has several flaws, most notably that it doesn’t strictly learn value iteration in practice. As you can see in the figure below, the value map produced by the VIN is not what we would expect from value iteration, whereas our model CALVIN learns to produce a value map that is almost identical to the theoretical solution. We identified the mismatch to be because the VIN is not constrained enough to penalise obstacles, hence making suboptimal decisions such as repeatedly exploring dead-ends.

Comparison of value maps produced by the VIN and CALVIN. The VIN produces non-interpretable, brittle value maps. (image by author)

CALVIN, on the other hand, explicitly learns valid and invalid transitions. It decomposes the transition model into a shared agent motion model and an action availability model. CALVIN uses the action availability model to penalise the invalid actions, and to prevent values from propagating from unreachable states. In addition to these constraints on available actions, we also improved the training loss so that the model can leverage training signals across the entire trajectory, instead of just the current state.

Model diagram of CALVIN (image by author)

We performed experiments, specifically on exploration of novel unknown environments, in three domains: a grid maze environment, MiniWorld [2], and the Active Vision Dataset [3]. CALVIN achieved a more robust navigation, even in unknown environments, demonstrating explorative behaviour which the VIN lacks.

In our grid maze setup, the agent can only view the maze locally. The agent can choose to either move forward, rotate left, right, or trigger done. We can see that the agent predicts higher values for places the agent hasn’t explored yet, and a high reward for the target location when the agent sees it.

CALVIN in a Grid Maze Environment

Next, we ran a similar experiment in a 3D maze environment called MiniWorld, but this time using RGB-D image sequences from the agent’s perspective rather than a top-down view. While the agent navigates, it builds up a map of embeddings with the Lattice PointNet, which is then fed into CALVIN. Here too, the agent has learned to assign lower values to walls and higher values to unexplored locations. We can observe that the agent manages to backtrack upon hitting a dead-end, and replan towards other unexplored cells. When the agent sees the target, it assigns a high reward to cells near the target.

CALVIN in MiniWorld

Finally, we tested the agent using the Active Vision Dataset, which is a collection of real-world images obtained by a robotic platform, from which we can create trajectories. For this task, we used pre-trained ResNet embeddings and fed them into the Lattice PointNet. The agent was trained to navigate towards a soda bottle in the room.

CALVIN in the Active Vision Dataset environment

CALVIN is able to explore and navigate unknown environments more robustly compared to other differentiable planners. This improvement from the VIN comes from explicitly modelling action availability, used to penalise invalid actions, together with an improved training loss that uses trajectory reweighting. We also introduced a Lattice PointNet backbone that efficiently fuses past observations in a spatially consistent way.

[1] Tamar et al., “Value Iteration Networks”, NeurIPS 2016.

[2] M. Chevalier-Boisvert, https://github.com/maximecb/gym-miniworld, 2018.

[3] Ammirato et al. “A dataset for developing and benchmarking active vision”, ICRA 2017.

FOLLOW US ON GOOGLE NEWS

Read original article here

Denial of responsibility! Techno Blender is an automatic aggregator of the all world’s media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials, please contact us by email – admin@technoblender.com. The content will be deleted within 24 hours.
Ai NewsCalvinenvironmentsIshidaJunLearnNavigatenetworkNeuralPlanShuTechnoblenderTechnologyUnknown
Comments (0)
Add Comment