The Unbalanced Classification Problem: Detecting Breaches in Security
Rensselaer Polytechnic Institute
This research proposes several methods designed to improve solutions for security classiﬁcation problems. The security classiﬁcation problem involves unbalanced, high-dimensional, binary classiﬁcation problems that are prevalent today. The imbalance within this data involves a signiﬁcant majority of the negative class and a minority positive class. Any system that needs protection from malicious activity, intruders, theft, or other types of breaches in security must address this problem. These breaches in security are considered instances of the positive class. Given numerical data that represent observations or instances which require classiﬁcation, state of the art machine learning algorithms can be applied. However, the unbalanced and high-dimensional structure of the data must be considered prior to applying these learning methods. High-dimensional data poses a “curse of dimensionality” which can be overcome through the analysis of subspaces. Exploration of intelligent subspace modeling and the fusion of subspace models is proposed. De-tailed analysis of the one-class support vector machine, as well as its weaknesses and proposals to overcome these shortcomings are included. A fundamental method for evaluation of the binary classiﬁcation model is the receiver operating characteristic (ROC) curve and the area under the curve (AUC). This work details the underlying statistics involved with ROC curves, contributing a comprehensive review of ROC curve construction and analysis techniques to include a novel graphic for illustrating the connection between ROC curves and classiﬁer decision values. The major innovations of this work include synergistic classiﬁer fusion through the analysis of ROC curves and rankings, insight into the statistical behavior of the gaussian kernel, and novel methods for applying machine learning techniques to defend against computer intrusion detection. The primary empirical vehicle for this research is computer in-trusion detection data, and both host-based intrusion detection systems (HIDS) and network-based intrusion detection systems (NIDS) are addressed. Empirical studies also include military tactical scenarios.
Theses or dissertations
Security Classification Problems
Evangelista, Paul, "The Unbalanced Classification Problem: Detecting Breaches in Security" (2006).