Big-DAMA '19- Proceedings of the 3rd ACM CoNEXT Workshop on Big DAta, Machine Learning and Artificial Intelligence for Data Communication Networks

Full Citation in the ACM Digital Library

SESSION: AI/ML for Network Security and Intrusion Detection

Identifying threats in a large company's inbox

  • Luigi Gallo
  • Alessio Botta
  • Giorgio Ventre

Cyber threats in emails continue to grow. Anti-spam filters have achieved good performance, but several spam emails still pass through them. Some of them are particularly dangerous as they represent attempts to breach the security policy of the company (e.g. inducing a manager to authorize a payment towards a fraudulent bank account). In this paper we propose an automated system to detect such emails, passing through antispam filter and potentially very dangerous. Our dataset is composed of real spam emails reported, collected, and labelled as critical or not by human analysts during each day of the last year in a large company's inbox. We firstly study the characteristics of dangerous emails and then train and use different supervised machine learning classifiers to detect them. Our results highlight the main distinguishing characteristics of such emails and that (a) Support Vector Machine and Random Forest classifiers achieve the best performance; (b) the full feature set considered allows to obtain up to 97% of recall and up to 92% of precision with supervised approaches; (c) highly dangerous spam emails can be easily detected with only 21 features.

Walling up Backdoors in Intrusion Detection Systems

  • Maximilian Bachl
  • Alexander Hartl
  • Joachim Fabini
  • Tanja Zseby

Interest in poisoning attacks and backdoors recently resurfaced for Deep Learning (DL) applications. Several successful defense mechanisms have been recently proposed for Convolutional Neural Networks (CNNs), for example in the context of autonomous driving. We show that visualization approaches can aid in identifying a backdoor independent of the used classifier. Surprisingly, we find that common defense mechanisms fail utterly to remove backdoors in DL for Intrusion Detection Systems (IDSs). Finally, we devise pruning-based approaches to remove backdoors for Decision Trees (DTs) and Random Forests (RFs) and demonstrate their effectiveness for two different network security datasets.

Towards Evaluation of NIDSs in Adversarial Setting

  • Mohammad J. Hashemi
  • Greg Cusack
  • Eric Keller

Signature-based Network Intrusion Detection Systems (NIDSs) have traditionally been used to detect malicious traffic, but they are incapable of detecting new threats. As a result, anomaly-based NIDSs, built on neural networks, are beginning to receive attention due to their ability to seek out new attacks. However, it has been shown that neural networks are vulnerable to adversarial example attacks in other domains. But, previously proposed anomaly-based NIDSs have not been evaluated in such adversarial settings. In this paper, we show how to evaluate an anomaly-based NIDS trained on network traffic in the face of adversarial inputs. We show how to craft adversarial inputs in the highly constrained network domain, and we evaluate 3 recently proposed NIDSs in an adversarial setting.

SESSION: AI/ML for Network Traffic Analysis

EXPLAIN-IT: Towards Explainable AI for Unsupervised Network Traffic Analysis

  • Andrea Morichetta
  • Pedro Casas
  • Marco Mellia

The application of unsupervised learning approaches, and in particular of clustering techniques, represents a powerful exploration means for the analysis of network measurements. Discovering underlying data characteristics, grouping similar measurements together, and identifying eventual patterns of interest are some of the applications which can be tackled through clustering. Being unsupervised, clustering does not always provide precise and clear insight into the produced output, especially when the input data structure and distribution are complex and difficult to grasp. In this paper we introduce EXPLAIN-IT, a methodology which deals with unlabeled data, creates meaningful clusters, and suggests an explanation to the clustering results for the end-user. EXPLAIN-IT relies on a novel explainable Artificial Intelligence (AI) approach, which allows to understand the reasons leading to a particular decision of a supervised learning-based model, additionally extending its application to the unsupervised learning domain. We apply EXPLAIN-IT to the problem of YouTube video quality classification under encrypted traffic scenarios, showing promising results.

Explaining Class-of-Service Oriented Network Traffic Classification with Superfeatures

  • Sayantan Chowdhury
  • Ben Liang
  • Ali Tizghadam

Recent studies have demonstrated that machine learning can be useful for application-oriented network traffic classification. However, a network operator may not be able to infer the application of a traffic flow due to the frequent appearance of new applications or due to privacy and other constraints set by regulatory bodies. In this work, we consider traffic flow classification based on the class of service (CoS), using delay sensitivity as an example in this preliminary study. Our focus is on direct CoS classification without first inferring the application. Our experiments with real-world encrypted TCP flows show that this direct approach can be substantially more accurate than a two-step approach that first classifies the flows based on their applications. However, without invoking application labels, the direct approach is more opaque than the two-step approach. Therefore, to provide human understandable interpretation of the trained learning model, we further propose an explanation framework that utilizes groups of superfeatures defined using domain knowledge and their Shapley values in a cooperative game that mimics the learning model. Our experimental results further demonstrate that this explanation framework is consistent and provides important insights into the classification results.

SESSION: AI/ML for Network Anomaly Detection

Comparing Machine Learning Algorithms for BGP Anomaly Detection using Graph Features

  • Odnan Ref Sanchez
  • Simone Ferlin
  • Cristel Pelsser
  • Randy Bush

The Border Gateway Protocol (BGP) coordinates the connectivity and reachability among Autonomous Systems, providing efficient operation of the global Internet. Historically, BGP anomalies have disrupted network connections on a global scale, i.e., detecting them is of great importance. Today, Machine Learning (ML) methods have improved BGP anomaly detection using volume and path features of BGP's update messages, which are often noisy and bursty. In this work, we identified different graph features to detect BGP anomalies, which are arguably more robust than traditional features. We evaluate such features through an extensive comparison of different ML algorithms, i.e., Naive Bayes classifier (NB), Decision Trees (DT), Random Forests (RF), Support Vector Machines (SVM), and Multi-Layer Perceptron (MLP), to specifically detect BGP path leaks. We show that SVM offers a good trade-off between precision and recall. Finally, we provide insights into the graph features' characteristics during the anomalous and non-anomalous interval and provide an interpretation of the ML classifier results.

Unsupervised machine learning for network-centric anomaly detection in IoT

  • Randeep Bhatia
  • Steven Benno
  • Jairo Esteban
  • T. V. Lakshman
  • John Grogan

Industry 4.0 holds the promise of greater automation and productivity but also introduces new security risks to critical industrial control systems from unsecured devices and machines. Networks need to play a larger role in stopping attacks before they disrupt essential infrastructure as host-centric IT security solutions, such as anti-virus and software patching, have been ineffective in preventing IoT devices from getting compromised. We propose a network-centric, behavior-learning based, anomaly detection approach for securing such vulnerable environments. We demonstrate that the predictability of TCP traffic from IoT devices can be exploited to detect different types of DDoS attacks in real-time, using unsupervised machine learning (ML). From a small set of features, our ML classifier can separate normal and anomalous traffic. Our approach can be incorporated in a larger system for identifying compromised end-points despite IP spoofing, thus allowing the use of SDN-based mechanisms for blocking attack traffic close to the source. Compared to supervised ML methods, our unsupervised ML approaches are easier to instrument and are more effective in detecting new and unseen attacks.