Autonomous cyber warfare agents: dynamic reinforcement learning for defensive cyber operations

In this work, we aim to develop novel cybersecurity playbooks by exploiting dynamic reinforcement learning (RL) methods to close holes in the attack surface left open by the traditional signature-based approach to Defensive Cyber Operations (DCO). A useful first proof-of-concept is provided by the problem of training a scanning defense agent using RL; as a first line of defense, it is important to protect sensitive networks from network mapping tools. To address this challenge, we developed a hierarchical, Monte Carlo-based RL framework for the training of an autonomous agent which detects and reports the presence of Nmap scans in near real-time, efficiently and with near-perfect accuracy. Our algorithm is powered by a reduction of the state space given by a transformer, CLAPBAC, an anomaly detection tool which applies natural language processing to cybersecurity in a manner consistent with state-of-the-art. In a realistic scenario emulated in CyberVAN, our approach generates optimized playbooks for effective defense against malicious insiders inappropriately probing sensitive networks.
Cyber Security
David A. Bierbrauer, Robert M. Schabinger, Caleb Carlin, Jonathan Mullin, John A. Pavlik, and Nathaniel D. Bastian "Autonomous cyber warfare agents: dynamic reinforcement learning for defensive cyber operations", Proc. SPIE 12538, Artificial Intelligence and Machine Learning for Multi-Domain Operations Applications V, 125380E (12 June 2023);