Comparing Machine Learning Techniques for Zeek Log Analysis
Date
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Network logs from intrusion detection and prevention systems such as Zeek provide a plethora of information to help network analysts identify malicious activity. However, the volume of data collected necessitates an automated way to filter it. Traditional signature-based “misuse detection” is unable to detect previously unseen malicious activity. In contrast, machine learning methods can suggest classifying network activity as normal or malicious by learning hard-to-define patterns that discriminate between the two classes. Previous work has applied a variety of machine learning techniques to this problem with some success, but the proprietary nature of real-world data often makes accurately comparing the performance of different techniques impossible. In this paper, we compare the performance of eight machine learning models on the same real-world dataset comprised of HTTP log data gathered over six months from an enterprise network. Our experiments show that, when trained and tested on the same data, k Nearest Neighbors results in 90.3% accuracy and outperforms the others in several ways.