Comparing Machine Learning Techniques for Zeek Log Analysis
Network logs from intrusion detection and prevention systems such as Zeek provide a plethora of information to help network analysts identify malicious activity. However, the volume of data collected necessitates an automated way to filter it. Traditional signature-based “misuse detection” is unable to detect previously unseen malicious activity. In contrast, machine learning methods can suggest classifying network activity as normal or malicious by learning hard-to-define patterns that discriminate between the two classes. Previous work has applied a variety of machine learning techniques to this problem with some success, but the proprietary nature of real-world data often makes accurately comparing the performance of different techniques impossible. In this paper, we compare the performance of eight machine learning models on the same real-world dataset comprised of HTTP log data gathered over six months from an enterprise network. Our experiments show that, when trained and tested on the same data, k Nearest Neighbors results in 90.3% accuracy and outperforms the others in several ways.
Training, Resistance, Analytical models, Training data, Intrusion detection, Machine learning, Organizations
D. K. Andrews, R. K. Agrawal, S. J. Matthews and A. S. Mentis, "Comparing Machine Learning Techniques for Zeek Log Analysis," 2019 IEEE MIT Undergraduate Research Technology Conference (URTC), Cambridge, MA, USA, 2019, pp. 1-4, doi: 10.1109/URTC49097.2019.9660501.