Firehose: Wildfire Prevention and Management using Deep Reinforcement Learning

William Shen, Aidan Curtis

MIT - 6.484 Spring 2022 Final Project

Paper (PDF) | Presentation (Google Slides)

Firehose is open source! Code (Github)

We use Deep Reinforcement Learning to train AI agents which are able to combat wildfires. This page demonstrates the videos of our learned policies.
Please see our paper for more details.

Baselines: random, Min-L2, Max-L2 (see paper for details).

Task Setting 1: Training and Evaluating on Fixed Ignition Points

Videos in order (left to right) = random, Min-L2, Max-L2, Ours (Maskable-PPO with CNN Policy Network)

20x20 Fixed Ignition Point

Observe that the RL agent learns to build a perimeter around fire to prevent it from spreading further.

40x40 Fixed Ignition Point

Similar to the 20x20 example shown above, the RL agent is able to learn to build a perimeter around the fire and then extinguish cells within that perimeter.

Task Setting 2: Training and Evaluating on Random Ignition Points

In this task setting, we are testing the ability of our RL agents to learn a reactive policy which generalizes across ignition points.

20x20 Random Ignition Points

Here, we can observe that the RL agent is able to generalize between different ignition points.

40x40 Random Ignition Points

We were unable to train a successful agent in 5 million steps for our 40x40 environment. However, given more training time we believe it will be able to learn a reactive policy.
Given the high-dimensional action space (40x40 = 1600), we believe this leads to sample inefficient training especially given actions are initially sampled at random.

Task Setting 3: Privileged Rewards

Protecting MIT

In this example, our reward prioritizes protecting the regions formed by the letters "M" "I" "T".
Our RL agents are able to successfully learn a policy to protect these regions.

William Shen and Aidan Curtis, 2022.