Generating realistic cyber data for training and evaluating machine learning classifiers for network intrusion detection systems

Chalé, Marc; Bastian, Nathaniel D.

Generating realistic cyber data for training and evaluating machine learning classifiers for network intrusion detection systems

Authors

Chalé, Marc

Bastian, Nathaniel D.

Issue Date

2022-07-02

Type

journal-article

Language

en_US

Abstract

Cyberspace operations, in conjunction with artificial intelligence and machine learning enhanced cyberspace infrastructure, make it possible to connect sensors directly to shooters independent of human control. These technologies serve as the pivot around which cyber data from the military’s Internet of Battlefield Things, for example, will be turned into actionable insight and knowledge and, ultimately, an information advantage for the military. As such, network intrusion detection systems must detect, evaluate, and respond to malicious cyber traffic at machine speed. Generative adversarial networks and variational autoencoders are fit as generative models with labeled cyber data from a real military enterprise network. These generative models are used to create realistic, synthetic cyber data. A combination of real and synthetic cyber data sets are then used to train several machine learning models for network intrusion detection. Purely synthetic data is shown to be statistically similar to the real data. There is no statistically significant difference in the performance of classifiers trained with real data versus a combination of real and synthetic data; however, classifiers trained with only synthetic data underperformed. To avoid a decrease in intrusion detection performance, classifiers must be trained with at least 15% real data.

Citation

Marc Chalé, Nathaniel D. Bastian, Generating realistic cyber data for training and evaluating machine learning classifiers for network intrusion detection systems, Expert Systems with Applications, Volume 207, 2022, 117936, ISSN 0957-4174, https://doi.org/10.1016/j.eswa.2022.117936.

URI

https://hdl.handle.net/20.500.14216/124

DOI

https://doi.org/10.1016/j.eswa.2022.117936

ISSN

0957-4174

Collections

Army Cyber Institute

Full item page

Generating realistic cyber data for training and evaluating machine learning classifiers for network intrusion detection systems

Authors

Issue Date

Type

Language

Keywords

Research Projects

Organizational Units

Journal Issue

Alternative Title

Abstract

Description

Citation

Publisher

License

Journal

Volume

Issue

URI

PubMed ID

DOI

ISSN

EISSN

Collections