We use Deep Reinforcement Learning to train AI agents which are able to combat wildfires. This page demonstrates the videos of our learned policies.
Please see our paper for more details.
Baselines: random, Min-L2, Max-L2 (see paper for details).
Videos in order (left to right) = random, Min-L2, Max-L2, Ours (Maskable-PPO with CNN Policy Network)
Observe that the RL agent learns to build a perimeter around fire to prevent it from spreading further.
Similar to the 20x20 example shown above, the RL agent is able to learn to build a perimeter around the fire and then extinguish cells within that perimeter.
In this task setting, we are testing the ability of our RL agents to learn a reactive policy which generalizes across ignition points.
Here, we can observe that the RL agent is able to generalize between different ignition points.
We were unable to train a successful agent in 5 million steps for our 40x40 environment. However, given more training time we believe it will be able to learn a reactive policy.
Given the high-dimensional action space (40x40 = 1600), we believe this leads to sample inefficient training especially given actions are initially sampled at random.
In this example, our reward prioritizes protecting the regions formed by the letters "M" "I" "T".
Our RL agents are able to successfully learn a policy to protect these regions.
William Shen and Aidan Curtis, 2022.