Generating realistic cyber data for training and evaluating machine learning classifiers for network intrusion detection systems

No Thumbnail Available

Authors

Chalé, Marc
Bastian, Nathaniel D.

Issue Date

2022-07-02

Type

journal-article

Language

en_US

Keywords

Research Projects

Organizational Units

Journal Issue

Alternative Title

Abstract

Cyberspace operations, in conjunction with artificial intelligence and machine learning enhanced cyberspace infrastructure, make it possible to connect sensors directly to shooters independent of human control. These technologies serve as the pivot around which cyber data from the military’s Internet of Battlefield Things, for example, will be turned into actionable insight and knowledge and, ultimately, an information advantage for the military. As such, network intrusion detection systems must detect, evaluate, and respond to malicious cyber traffic at machine speed. Generative adversarial networks and variational autoencoders are fit as generative models with labeled cyber data from a real military enterprise network. These generative models are used to create realistic, synthetic cyber data. A combination of real and synthetic cyber data sets are then used to train several machine learning models for network intrusion detection. Purely synthetic data is shown to be statistically similar to the real data. There is no statistically significant difference in the performance of classifiers trained with real data versus a combination of real and synthetic data; however, classifiers trained with only synthetic data underperformed. To avoid a decrease in intrusion detection performance, classifiers must be trained with at least 15% real data.

Description

Citation

Marc Chalé, Nathaniel D. Bastian, Generating realistic cyber data for training and evaluating machine learning classifiers for network intrusion detection systems, Expert Systems with Applications, Volume 207, 2022, 117936, ISSN 0957-4174, https://doi.org/10.1016/j.eswa.2022.117936.

Publisher

License

Journal

Volume

Issue

PubMed ID

ISSN

0957-4174

EISSN