Cyber threats in emails continue to grow. Anti-spam filters have achieved good performance, but several spam emails still pass through them. Some of them are particularly dangerous as they represent attempts to breach the security policy of the company (e.g. inducing a manager to authorize a payment towards a fraudulent bank account). In this paper we propose an automated system to detect such emails, passing through antispam filter and potentially very dangerous. Our dataset is composed of real spam emails reported, collected, and labelled as critical or not by human analysts during each day of the last year in a large company's inbox. We firstly study the characteristics of dangerous emails and then train and use different supervised machine learning classifiers to detect them. Our results highlight the main distinguishing characteristics of such emails and that (a) Support Vector Machine and Random Forest classifiers achieve the best performance; (b) the full feature set considered allows to obtain up to 97% of recall and up to 92% of precision with supervised approaches; (c) highly dangerous spam emails can be easily detected with only 21 features.
Interest in poisoning attacks and backdoors recently resurfaced for Deep Learning (DL) applications. Several successful defense mechanisms have been recently proposed for Convolutional Neural Networks (CNNs), for example in the context of autonomous driving. We show that visualization approaches can aid in identifying a backdoor independent of the used classifier. Surprisingly, we find that common defense mechanisms fail utterly to remove backdoors in DL for Intrusion Detection Systems (IDSs). Finally, we devise pruning-based approaches to remove backdoors for Decision Trees (DTs) and Random Forests (RFs) and demonstrate their effectiveness for two different network security datasets.
Signature-based Network Intrusion Detection Systems (NIDSs) have traditionally been used to detect malicious traffic, but they are incapable of detecting new threats. As a result, anomaly-based NIDSs, built on neural networks, are beginning to receive attention due to their ability to seek out new attacks. However, it has been shown that neural networks are vulnerable to adversarial example attacks in other domains. But, previously proposed anomaly-based NIDSs have not been evaluated in such adversarial settings. In this paper, we show how to evaluate an anomaly-based NIDS trained on network traffic in the face of adversarial inputs. We show how to craft adversarial inputs in the highly constrained network domain, and we evaluate 3 recently proposed NIDSs in an adversarial setting.
The application of unsupervised learning approaches, and in particular of clustering techniques, represents a powerful exploration means for the analysis of network measurements. Discovering underlying data characteristics, grouping similar measurements together, and identifying eventual patterns of interest are some of the applications which can be tackled through clustering. Being unsupervised, clustering does not always provide precise and clear insight into the produced output, especially when the input data structure and distribution are complex and difficult to grasp. In this paper we introduce EXPLAIN-IT, a methodology which deals with unlabeled data, creates meaningful clusters, and suggests an explanation to the clustering results for the end-user. EXPLAIN-IT relies on a novel explainable Artificial Intelligence (AI) approach, which allows to understand the reasons leading to a particular decision of a supervised learning-based model, additionally extending its application to the unsupervised learning domain. We apply EXPLAIN-IT to the problem of YouTube video quality classification under encrypted traffic scenarios, showing promising results.
Recent studies have demonstrated that machine learning can be useful for application-oriented network traffic classification. However, a network operator may not be able to infer the application of a traffic flow due to the frequent appearance of new applications or due to privacy and other constraints set by regulatory bodies. In this work, we consider traffic flow classification based on the class of service (CoS), using delay sensitivity as an example in this preliminary study. Our focus is on direct CoS classification without first inferring the application. Our experiments with real-world encrypted TCP flows show that this direct approach can be substantially more accurate than a two-step approach that first classifies the flows based on their applications. However, without invoking application labels, the direct approach is more opaque than the two-step approach. Therefore, to provide human understandable interpretation of the trained learning model, we further propose an explanation framework that utilizes groups of superfeatures defined using domain knowledge and their Shapley values in a cooperative game that mimics the learning model. Our experimental results further demonstrate that this explanation framework is consistent and provides important insights into the classification results.
The Border Gateway Protocol (BGP) coordinates the connectivity and reachability among Autonomous Systems, providing efficient operation of the global Internet. Historically, BGP anomalies have disrupted network connections on a global scale, i.e., detecting them is of great importance. Today, Machine Learning (ML) methods have improved BGP anomaly detection using volume and path features of BGP's update messages, which are often noisy and bursty. In this work, we identified different graph features to detect BGP anomalies, which are arguably more robust than traditional features. We evaluate such features through an extensive comparison of different ML algorithms, i.e., Naive Bayes classifier (NB), Decision Trees (DT), Random Forests (RF), Support Vector Machines (SVM), and Multi-Layer Perceptron (MLP), to specifically detect BGP path leaks. We show that SVM offers a good trade-off between precision and recall. Finally, we provide insights into the graph features' characteristics during the anomalous and non-anomalous interval and provide an interpretation of the ML classifier results.
Industry 4.0 holds the promise of greater automation and productivity but also introduces new security risks to critical industrial control systems from unsecured devices and machines. Networks need to play a larger role in stopping attacks before they disrupt essential infrastructure as host-centric IT security solutions, such as anti-virus and software patching, have been ineffective in preventing IoT devices from getting compromised. We propose a network-centric, behavior-learning based, anomaly detection approach for securing such vulnerable environments. We demonstrate that the predictability of TCP traffic from IoT devices can be exploited to detect different types of DDoS attacks in real-time, using unsupervised machine learning (ML). From a small set of features, our ML classifier can separate normal and anomalous traffic. Our approach can be incorporated in a larger system for identifying compromised end-points despite IP spoofing, thus allowing the use of SDN-based mechanisms for blocking attack traffic close to the source. Compared to supervised ML methods, our unsupervised ML approaches are easier to instrument and are more effective in detecting new and unseen attacks.