The Internet is composed of networks, called Autonomous Systems (or, ASes), interconnected to each other, thus forming a large graph. While both the AS-graph is known and there is a multitude of data available for the ASes (i.e., node attributes), the research on applying graph machine learning (ML) methods on Internet data has not attracted a lot of attention. In this work, we provide a benchmarking framework aiming to facilitate research on Internet data using graph-ML and graph neural network (GNN) methods. Specifically, we compile a dataset with heterogeneous node/AS attributes by collecting data from multiple online sources, and preprocessing them so that they can be easily used as input in GNN architectures. Then, we create a framework/pipeline for applying GNNs on the compiled data. For a set of tasks, we perform a benchmarking of different GNN models (as well as, non-GNN ML models) to test their efficiency; our results can serve as a common baseline for future research and provide initial insights for the application of GNNs on Internet data.
The Border Gateway Protocol (BGP) is central to the global connectivity of the Internet, enabling fast and efficient dissemination of routing information. Hence, detecting any anomaly concerning BGP announcements is of critical importance to ensure the continuous operation of Internet services.
Typically, BGP anomaly detection algorithms have relied on features of the BGP messages, such as the average length of the AS_PATH attribute, the volume of messages, or the type of message (announcement or withdrawal). Even though these algorithms provide good performance, they do not take into account the BGP topology, that is, the graph of ASes created by the BGP announcements.
In this paper we investigate if such topology can be useful to predict BGP anomalies. We leverage Graph Neural Networks (GNN), a subset of the Neural Network (NN) family that is designed to process graph-structured data. We propose a GNN model to detect BGP anomalies and study its generalization capability. We compare its performance with two baseline models: a Support Vector Machine (SVM) and a Multilayer Perceptron (MLP), two Machine Learning (ML) techniques used in state-of-the-art solutions. Our GNN model achieves an accuracy of 79.6% using a weakly supervised dataset of 300 anomalies and is able to outperform the two baseline models.
We provide a highly-efficient solution to the classical problem of scheduling task graphs corresponding to complex applications on distributed computing systems. A number of heuristics have been previously proposed to optimize task scheduling with respect to different metrics (e.g. makespan and throughput). However, they tend to be slow to run, particularly for larger problem instances, limiting their applicability in more dynamic systems. Motivated by the goal of solving these problems more rapidly, we propose, for the first time, a graph convolutional network-based scheduler (GCNScheduler). By carefully integrating the inter-task data dependency structure and the computational network into a single input graph, the GCNScheduler can efficiently schedule tasks of complex applications for a given objective. We use simulations to illustrate that not only can our scheme quickly and efficiently learn from existing scheduling schemes, but also it can easily be applied to large-scale settings that current scheduling schemes fail to handle. We demonstrate the generalization of GCNScheduler to unseen real-world applications and show that it achieves almost the same makespan and throughput as benchmarks, while providing several orders of magnitude faster scheduling times.
TCP throughput and RTT prediction are essential to model TCP behavior and optimize network configurations. Flows adapt their sending rate to network parameters like link capacity or buffer size and interact with parallel flows. Especially the elastic behavior of TCP congestion control can vary, even when only slight changes in the network occur. Thus, existing analytical models for TCP behavior reach their limits due to the number and complexity of different algorithms. Machine learning approaches, in contrast, are often fixed to specific network topologies.
This paper presents a TCP bandwidth and RTT prediction approach that can handle different algorithms and topologies. For this, we utilize Gated Graph Neural Networks and simulated network traffic. We evaluate different encodings of the input data into graphs and how network size, number of flows, and TCP algorithms influence prediction accuracy. Additionally, we quantify the impact of different input features on our models. We show that Graph Neural Networks can be used to model TCP behavior. The resulting models can predict RTT with a median relative error of 2.29% and throughput with an error of 13.31%.
Wireless networks have progressed exponentially over the last decade, and modern wireless networking is today a complex to manage tangle, serving an ever-growing number of end-devices through a plethora of technologies. The broad range of use cases supported by wireless networking requires the conception of smarter resource allocation approaches, which make the most of the scarce wireless resources. We address the problem of user association (UA) in wireless systems. We consider a particularly challenging setup for UA, represented by modern ad-hoc networks such as FANETS, where connectivity is provided by a group of unmanned aerial vehicles (UAVs). We introduce GROWS, a Deep Reinforcement Learning (DRL) driven approach to efficiently connect wireless users to the network, leveraging Graph Neural Networks (GNNs) to better model the function of expected rewards. While GROWS is not tied to any specific wireless technology, the decentralized nature of FANETS and the lack of a pre-existing infrastructure makes a perfect case study. We show that GROWS learns UA policies for FANETS which largely outperform currently used association heuristics, realizing up to 20% higher throughput utility while reducing user rejection by more than 90%, and that these policies are robust to concept drifts in the expected load of traffic, maintaining performance improvements for previously unseen traffic loads.
Airtime interference is a key performance indicator for WLANs, measuring, for a given time period, the percentage of time during which a node is forced to wait for other transmissions before to transmitting or receiving. Being able to accurately estimate interference resulting from a given state change (e.g., channel, bandwidth, power) would allow a better control of WLAN resources, assessing the impact of a given configuration before actually implementing it.
In this paper, we adopt a principled approach to interference estimation in WLANs. We first use real data to characterize the factors that impact it, and derive a set of relevant synthetic workloads for a controlled comparison of various deep learning architectures in terms of accuracy, generalization and robustness to outlier data. We find, unsurprisingly, that Graph Convolutional Networks (GCNs) yield the best performance overall, leveraging the graph structure inherent to campus WLANs. We notice that, unlike e.g. LSTMs, they struggle to learn the behavior of specific nodes, unless given the node indexes in addition. We finally verify GCN model generalization capabilities, by applying trained models on operational deployments unseen at training time.
The advent of 5G networks has attracted a flurry of measurement studies to understand their performance in various settings. Unfortunately, carrying out an in-depth measurement study of 5G is both laborious and costly. The measurement samples cover only limited points in a (potentially large) coverage area of one or more 5G towers/base stations. In this paper, we tackle the following basic question: given a collection of 5G "signal" measurements collected in limited locations in a target 5G coverage area, can we infer or extrapolate 5G "signals" at other locations within the area that we do not have samples? We propose a novel learning paradigm based on graph neural networks (GNNs), dubbed 5GNN, which captures both the "local" and "global" patterns of the underlying spatial correlation of 5G signals based on the measured data points. This paradigm is guided by insights from the physical characteristics of 5G networks. We conduct comprehensive experiments and evaluations using both synthetic and real-world datasets, which are collected and processed by ourselves with professional tools. Compared with baseline models using existing GNNs, 5GNN is superior and can reduce the estimation errors for the signal imputation task and channel quality regression task by up to 12.8% and 9.2%, respectively.
With the development of 5G and IoT networks, Device-to-Device (D2D) communication has become a major paradigm in wireless communication. Most existing approaches for D2D resource allocation are usually time consuming and demand a high computational budget, especially in heterogeneous deployments where the D2D links have different configurations (i.e., different number of transmit and receive antennas). Recently, Graph neural networks (GNNs) have been proposed to solve many problems in the networking domain and have significantly outperformed traditional algorithms, including throughput optimization problems in D2D networks. However, existing throughput optimization works either only apply to MISO or SISO D2D networks or require extremely long runtime on MIMO D2D networks, which makes it hard to apply them in real-world D2D applications. In this paper, we consider the throughput prediction problem across a fixed association of transmitters and receivers to maximize the total throughput in heterogeneous MIMO D2D networks. We model the interference between different link types as heterogeneous edges and learn the optimal beamforming policy using a heterogeneous GNN. Simulation results show that our proposed GNN-based approach achieves a significant speedup compared with the state-of-the-art algorithm, while providing robust performance on large-scale synthetic datasets.