CHALLENGES AND OPPORTUNITIES FOR GENERATIVE METHODS IN THE CYBER DOMAIN
No Thumbnail Available
Authors
Chalé, Marc
Bastian, Nathaniel D.
Issue Date
2021-12-15
Type
Other
Language
en_US
Keywords
Alternative Title
Abstract
Large, high quality data sets are essential for training machine learning models to perform their tasks accurately. The lack of such training data has constrained machine learning research in the cyber domain. This work explores how Markov Chain Monte Carlo (MCMC) methods can be used for realistic synthetic data generation and compares it to several existing generative machine learning techniques. The performance of MCMC is compared to generative adversarial network (GAN) and variational autoencoder (VAE) methods to estimate the joint probability distribution of network intrusion detection system data. A statistical analysis of the synthetically generated cyber data determines the goodness of fit, aiming to improve cyber threat detection. The experimental results suggest that the data generated from MCMC fits the true distribution approximately as well as the data generated from GAN and VAE; however, the MCMC requires a significantly longer training period and is unproven for higher dimensional cyber data.
Description
Citation
M. Chalé and N. D. Bastian, "CHALLENGES AND OPPORTUNITIES FOR GENERATIVE METHODS IN THE CYBER DOMAIN," 2021 Winter Simulation Conference (WSC), Phoenix, AZ, USA, 2021, pp. 1-12, doi: 10.1109/WSC52266.2021.9715504.
Publisher
2021 Winter Simulation Conference (WSC)
