Improving Reward Functions in Robots Playing Capture the Flag Using Q-Learning
This paper builds on previous work that used Q-Learning to teach simulated marine robots to play capture the flag against an unintelligent enemy robot. This paper will focus on the improvement of reward functions to achieve two goals: make the attacking robot more tactical in its approach to the flag, and rid the attacking robot of its indifference to the enemy's location (which often leads to the attacking robot being captured). This paper alters reward function components (dependent on the distance from the attacking robot to the nearest boundary, enemy, and enemy flag) and contributes two new components (dependent on the robot's angle to the enemy flag and the robot's angle to the enemy's heading, known as the deviation angle). This paper then trains four models with variations of reward function components to improve capture the flag behavior of the attacking robot. The models were successful in finding more efficient approaches to the flag than the previous project, although the attacking robot did not capture the flag as many times as the previous project. However, the findings give potential to follow-on work to achieve better results than the parent project. This paper will be continued in hopes of finding more efficient reward functions to increase the intelligence of the attacking robot's approach. Future work will focus on teaching multi-robot teams to capture the flag from an intelligent adversary.
Conferences, Computational modeling, Aggregates, Education, Robots, Marine robots
T. Powers, M. Novitzky and C. Korpela, "Improving Reward Functions in Robots Playing Capture the Flag Using Q-Learning," 2021 IEEE 11th Annual Computing and Communication Workshop and Conference (CCWC), NV, USA, 2021, pp. 0426-0431, doi: 10.1109/CCWC51732.2021.9375906.