Predicting Phishing Vulnerabilities Using Machine Learning
This paper examines the ability to use machine learning to predict an undergraduate student’s actions upon receiving a phishing email. The machine learning models used in this work were trained with actual phishing results augmented with student’s background and administrative data. The ultimate goal of this project is to better identify members of an organization that are at risk from phishing, to provide targeted cyber security training. This targeted training will increase the security posture of an organization and minimize unnecessary training and productivity loss.The results of multiple machine learning techniques demonstrate that this approach is viable with validation accuracy ranging from 49 to 86%. Other metrics are used to evaluate the viability of the approaches, recall is determined to be the most important. The model with the best performance in validation using these two metrics was a Support Vector Machine (SVM). The SVM approach was able to predict whether a cadet would be compromised upon receipt of a phishing attack with a 55% accuracy while maintaining a recall score of 71%.When using the trained model on new data after training and validation the Logistic Regression model had the highest performance, accurately predicting whether a cadet would be compromised upon receipt of a phishing attack with a 86% accuracy while maintaining a recall score of 16%.
Training, Support vector machines, Phishing, Organizations, Machine learning, Predictive models, Data models
S. Rutherford, K. Lin and R. W. Blaine, "Predicting Phishing Vulnerabilities Using Machine Learning," SoutheastCon 2022, Mobile, AL, USA, 2022, pp. 779-786, doi: 10.1109/SoutheastCon48659.2022.9764045.