|
SIGCOMM 2004
Conference Papers
Papers
Index
A
First-Principles Approach to Understanding the Internet's Router-level
Topology
Vivaldi:
A Decentralized Network Coordinate System
Routing
Design in Operational Networks: A Look from the Inside
Locating
Internet Bottlenecks: Algorithms, Measurements, and Implications
An
Algebraic Approach to Practical and Scalable Overlay Network Monitoring
CapProbe:
A Simple and Accurate Capacity Estimation Technique
Optimizing
Cost and Performance for Multihoming
A
Comparison of Overlay Routing and Multihoming Route Control
The
Feasibility of Supporting Large-Scale Live Streaming Applications with
Dynamic Application End-Points
Link-level
Measurements from an 802.11b Mesh Network
Comparison
of Routing Metrics for Static Multi-Hop Wireless Networks
Routing
in a Delay Tolerant Network
Turning
the Postal System into a Generic Digital Communication Mechanism
A
System for Authenticated Policy-Compliant Routing
SPV:
Secure Path Vector Routing for Securing BGP
Shield:
Vulnerability-Driven Network Filters for Preventing Known Vulnerability
Exploits
Locating
Internet Routing Instabilities
Diagnosing
Network-Wide Traffic Anomalies
Network
Sensitivity to Hot-Potato Disruptions
Building
a better NetFlow
Work-Conserving
Distributed Schedulers for Terabit Routers
Exact
GPS Simulation with Logarithmic Complexity, and its Application to an
Optimally Fair Scheduler
Sizing
Router Buffers
A
Wavelet-Based Approach to Detect Shared Congestion
Delayed
Stability and Performance of Distributed Congestion Control
Impact
of Configuration Errors on DNS Robustness
The
Design and Implementation of a Next Generation Name Service for the
Internet
A
Layered Naming Architecture for the Internet
Mercury:
Supporting Scalable Multi-Attribute Range Queries
Modeling
and Performance Analysis of BitTorrent-Like Peer-to-Peer Networks
A
Scalable Distributed Information Management System
A
First-Principles Approach to Understanding the Internet's Router-level
Topology
|
Lun Li (CalTech), David
Alderson (CalTech), Walter Willinger (AT&T Labs--Research), John
Doyle (CalTech) |
|
Abstract: |
|
|
|
|
A detailed
understanding of the many
facets of the Internet's topological structure is critical for
evaluating the performance of networking protocols, for assessing the
effectiveness of proposed techniques to protect the network from
nefarious intrusions and attacks, or for developing improved designs
for resource provisioning. Previous studies of topology have focused on
interpreting measurements or on phenomenological descriptions and
evaluation of graph-theoretic properties of topology generators. We
propose a complementary approach of combining a more subtle use of
statistics and graph theory with a first-principles theory of
router-level topology that reflects practical constraints and
tradeoffs. While there is an inevitable tradeoff between model
complexity and fidelity, a challenge is to distill from the seemingly
endless list of potentially relevant technological and economic issues
the features that are most essential to a solid understanding of the
intrinsic fundamentals of network topology. We claim that very simple
models that incorporate hard technological constraints on router and
link bandwidth and connectivity, together with abstract models of user
demand and network performance, can successfully address this challenge
and further resolve much of the confusion and controversy that has
surrounded topology generation and evaluation.
|
Vivaldi:
A Decentralized Network Coordinate System
|
Frank Dabek (MIT), Russ
Cox (MIT), Frans Kaashoek (MIT), Robert Morris (MIT) |
|
Abstract: |
|
|
|
|
Large-scale
Internet applications can benefit from an ability to predict round-trip
times to other hosts without having to contact them first.
Explicit measurements are often unattractive because the cost of
measurement can outweigh the benefits of exploiting proximity
information. Vivaldi is a simple, light-weight algorithm that
assigns synthetic coordinates to hosts such that the distance between
the coordinates of two hosts accurately predicts the communication
latency
between the hosts.
Vivaldi is fully distributed, requiring no fixed network infrastructure
and no distinguished hosts. It is also efficient: a new host can
compute good coordinates for itself after collecting latency
information from only a few other hosts. Because it requires little
communication, Vivaldi can piggy-back on the communication patterns of
the application using it and scale to a large number of hosts.
An evaluation of Vivaldi using a simulated network whose latencies are
based on measurements among 1740 Internet hosts shows that a
2-dimensional Euclidean model with height
vectors embeds these hosts with low error (the median relative
error in round-trip time prediction is 11 percent).
|
Routing Design in Operational Networks: A
Look from the Inside
|
David Maltz (CMU), Geoff
Xie (CMU), Jibin Zhan (CMU), Hui Zhang (CMU), Gisli Hjalmtysson
(AT&T Labs--Research), Albert Greenberg (AT&T Labs--Research) |
|
Abstract: |
|
|
|
|
In any IP network,
routing protocols provide the intelligence that takes a collection of
physical links and transforms them into a network that enables packets
to travel from one host to another. Though routing design is arguably
the single most important design task for large IP networks, there has
been very little systematic investigation into how routing protocols
are actually used in production networks to implement the goals of
network architects. We have developed a methodology for reverse
engineering a coherent global view of a network's routing design from
the static analysis of dumps of the local configuration state of each
router. Starting with a set of 8,035 configuration files, we have
applied this method to 31 production networks. In this paper we
present a detailed examination of how routing protocols are used in
operational networks. In particular, the results show the conventional
model of interior and exterior gateway protocols is insufficient to
describe the diverse set of mechanisms used by architects. We provide
examples of the more unusual designs and examine their trade-offs. We
discuss the strengths and weaknesses of our methodology, and argue that
it opens paths towards new understandings of network behavior and
design.
|
Locating Internet
Bottlenecks: Algorithms, Measurements and Implications
|
Ningning Hu (CMU), Li
Erran Li (Bell Laboratories), Zhuoqing Morley Mao (U. Michigan), Peter
Steenkiste (CMU), Jia Wang (AT&T Labs--Research) |
|
Abstract: |
|
|
|
|
The ability to
locate network bottlenecks along end-to-end paths on the Internet is of
great interest to both network operators and researchers. For
example, knowing where bottleneck links are, network operators can
apply traffic engineering either at the interdomain or intradomain
level to improve routing. Existing tools either fail to identify the
location of bottlenecks, or generate a large amount of probing packets.
In addition, they often require access to both end points. In
this paper we present Pathneck, a tool that allows end users to
efficiently and accurately locate the bottleneck link on an Internet
path. Pathneck is based on a novel probing technique called
Recursive Packet Train (RPT) and does not require access to the
destination. We evaluate Pathneck using wide area Internet
experiments and trace-driven emulation. In
addition, we present the results of an extensive study on bottlenecks
in the Internet using carefully selected, geographically diverse
probing sources and destinations. We found that Pathneck can
successfully detect bottlenecks for almost 80% of the Internet paths we
probed. We also report our success in using the bottleneck
location and bandwidth bounds provided by Pathneck to infer bottlenecks
and to avoid bottlenecks in multihoming and overlay routing.
|
An Algebraic Approach to Practical
and Scalable Overlay Network Monitoring
|
Yan Chen (Northwestern
University), David Bindel (UC Berkeley), Hanhee Song (UC Berkeley) ,
Randy H. Katz (UC Berkeley) |
|
Abstract: |
|
|
|
|
Overlay network
monitoring enables distributed Internet applications to detect and
recover from path outages and periods of degraded performance within
seconds. For an overlay network
with n end hosts, existing
systems either require O(n^2) measurements, and thus lack scalability,
or can only estimate the latency but not congestion or failures. Our
earlier extended
abstract briefly proposes an algebraic approach that selectively
monitors k linearly
independent paths that can fully describe all the O(n^2) paths. The
loss rates and latency of these k
paths can be used to estimate the loss rates and latency of all other
paths. Our scheme only assumes knowledge of the underlying IP topology,
with links dynamically varying between lossy and normal.
In this paper, we improve, implement and extensively evaluate such a
monitoring system. We further make the following contributions:
i) scalability analysis indicating that for reasonably large n (e.g., 100), the growth of k is bounded as O(n log n), ii)
efficient adaptation algorithms for topology changes, such as the
addition or removal of end hosts and routing changes, iii) measurement
load balancing schemes, and iv) topology measurement error
handling. Both simulation and Internet experiments demonstrate we
obtain highly accurate path loss rate
estimation while adapting to topology changes within seconds and
handling topology errors.
|
CapProbe: A Simple and Accurate
Capacity
Estimation Technique
|
Rohit Kapoor (Qualcomm),
Ling-Jyh Chen (UCLA), Li Lao (UCLA), Mario Gerla (UCLA), M. Y. Sanadidi
(UCLA) |
|
Abstract: |
|
|
|
|
We present a new
capacity estimation technique, called CapProbe. CapProbe combines
delay as well as dispersion measurements of packet pairs to filter out
samples distorted by cross-traffic. CapProbe algorithms include
convergence tests and convergence speed-up techniques by varying
probing parameters. Our study of CapProbe includes a probability
analysis to determine the time it takes CapProbe to converge on the
average. Through simulations and
measurements, we found CapProbe to be quick and accurate across a wide
range of traffic scenarios. We also compared CapProbe with two previous
well-known techniques, pathchar and pathrate. We found CapProbe to be
much more accurate than pathchar and similar in accuracy to pathrate,
while providing faster estimation than both. Another advantage of
CapProbe is its lower computation cost,since no statistical post
processing of probing data is required.
|
Optimizing Cost and Performance for
Multihoming
|
David K. Goldenberg
(Yale), Lili Qiu (Microsoft Research), Haiyong Xie (Yale), Yang Richard
Yang (Yale), Yin Zhang (AT&T Labs-Research) |
|
Abstract: |
|
|
|
|
Multihoming is
often used by large enterprises and stub ISPs to connect to the
Internet. In this paper, we design a series of novel smart
routing algorithms to optimize cost and performance for multihomed
users. We evaluate our algorithms through both analysis and
extensive simulations based on realistic charging models, traffic
demands, performance data, and network topologies. Our results suggest
that these algorithms are very effective in minimizing cost and at the
same time improving performance. We further examine the
equilibrium performance of smart routing in a global setting and show
that a smart routing user can improve its performance without adversely
affecting other users.
|
A Comparison of Overlay Routing and
Multihoming Route Control
|
Aditya Akella (CMU),
Jeffrey Pang (CMU), Anees Shaikh (IBM Research), Bruce Maggs (CMU),
Srinivasan Seshan (CMU) |
|
Abstract: |
|
|
|
|
The limitations of
BGP routing in the Internet are often blamed for poor end-to-end
performance and prolonged connectivity interruptions. Recent work
advocates using overlays to effectively bypass BGP's path selection in
order to improve performance and fault tolerance. In this paper,
we explore the possibility that intelligent control of BGP routes,
coupled with ISP multihoming, can provide competitive end-to-end
performance and reliability. Using extensive measurements
of paths between nodes in a large content distribution network, we
compare the relative benefits of overlay routing and multihoming route
control in terms of round-trip latency, TCP connection throughput, and
path availability. We observe that the performance achieved by route
control together with multihoming to three ISPs (3-multihoming), is
within 5-15% of overlay routing employed in conjunction 3-multihoming,
in terms of both end-to-end RTT and throughput. We also show that while
multihoming cannot offer the nearly perfect resilience of overlays, it
can eliminate almost all failures experienced by a singly-homed
end-network. Our results demonstrate that, by leveraging the
capability of multihoming route control, it is not necessary to
circumvent BGP routing to extract good wide-area performance and
availability from the existing routing system.
Click here for an errata for the
paper (August 2004).
|
The Feasibility of Supporting
Large-Scale Live Streaming Applications with Dynamic Application
End-Points
|
Kunwadee Sripanidkulchai
(CMU), Aditya Ganjam (CMU), Bruce Maggs (CMU), Hui Zhang (CMU) |
|
Abstract: |
|
|
|
|
While application
end-point architectures have proven to be viable solutions for
large-scale distributed applications such as distributed computing and
file-sharing, there is little known about its feasibility for more
bandwidth-demanding applications such as live streaming.
Heterogeneity in bandwidth resources and dynamic group membership,
inherent properties of application end-points, may adversely affect the
construction of a usable and efficient overlay. At large scales,
the problems become even more challenging. Even more challenging
is attempting to do this at very large scales.
In this paper, we study one of the most prominent architectural issues
in overlay multicast: the feasibility of supporting large-scale groups
using an application end-point architecture. We look at three key
requirements for feasibility: (i) are there enough resources to
construct an overlay, (ii) can a stable and connected overlay be
maintained in the presence of group dynamics, and (iii) can an
efficient overlay be constructed? Using traces from a large
content delivery network, we characterize the behavior of users
watching live audio and video streams. We show that in many
common real-world scenarios, all three requirements are
satisfied. In addition, we evaluate the performance of several
design alternatives and show that simple algorithms have the potential
to meet these requirements in practice. Overall, our results
argue for the feasibility of supporting large-scale live streaming
using an application end-point architecture.
|
Link-level Measurements from an
802.11b Mesh Network
|
Daniel Aguayo (MIT), John
Bicket (MIT), Sanjit Biswas (MIT), Glenn Judd (CMU), Robert Morris
(MIT) |
|
Abstract: |
|
|
|
|
This paper analyzes
the causes of packet loss in a 38-node urban multi-hop 802.11b network.
The patterns and causes of loss are important in the design of routing
and errorcorrection protocols, as well as in network planning.
The paper makes the following observations. The distribution of
inter-node loss rates is relatively uniform over the whole range of
loss rates; there is no clear threshold separating “in range” and “out
of range.” Most links have relatively stable loss rates from one second
to the next, though a small minority have very bursty losses at that
time scale. Signal-to-noise ratio and distance have little
predictive value for loss rate. The large number of links with
intermediate loss rates is probably due to multi-path fading rather than
attenuation or interference. The phenomena discussed here are all
well-known. The contributions of this paper are an understanding of
their relative importance, of how they interact, and of the
implications for MAC and routing protocol design.
|
Comparison of Routing Metrics for
Static Multi-Hop Wireless Networks
|
Richard Draves (Microsoft
Research), Jitendra Padhye (Microsoft Research), Brian Zill (Microsoft
Research) |
|
Abstract: |
|
|
|
|
Routing protocols
for wireless ad hoc networks have traditionally focused on finding
paths with minimum hop count. However, such paths can include slow or
lossy links, leading to poor throughput. A routing algorithm can select
better paths by explicitly taking the quality of the wireless links
into account. In this paper, we conduct a detailed, empirical
evaluation of the performance of three link-quality metrics---ETX,
per-hop RTT, and per-hop packet pair---and compare them against minimum
hop count. We study these metrics using a DSR-based routing protocol
running over a wireless testbed. We find that the ETX metric has the
best performance when all nodes are stationary. We also find that
the per-hop RTT and per-hop packet-pair metrics perform poorly due to
self-interference. Interestingly, the hop-count metric outperforms all
of the link-quality metrics in a scenario where the sender is mobile.
|
Routing in a Delay Tolerant Network
|
Sushant Jain (U.
Washington), Kevin Fall (Intel Research), Rabin Patra (UC Berkeley) |
|
Abstract: |
|
|
|
|
We formulate the
delay-tolerant networking routing problem, where messages are to be
moved end-to-end across a connectivity graph that is time-varying but
whose dynamics may be known in advance. The problem has the added
constraints of finite buffers at each node and the general property
that no contemporaneous end-to-end path may ever exist. This
situation limits the applicability of traditional routing approaches
that tend to treat outages as failures and seek to find an existing
end-to-end path.
We propose a framework for evaluating routing algorithms in such
environments. We then develop several algorithms and use
simulations to compare their performance with respect to
the amount of knowledge they require about network topology. We
find that, as expected, the algorithms using the least knowledge
tend to perform poorly. We also find that with limited additional
knowledge, far less than complete global knowledge, efficient
algorithms can be constructed for routing in such environments.
To the best of our knowledge this is the first such investigation of
routing issues in DTNs.
|
Turning the Postal System into a Generic
Digital Communication Mechanism
|
Randolph Y. Wang
(Princeton), Sumeet Sobti (Princeton), Nitin Garg (Princeton), Elisha
Ziskind (Princeton), Junwen Lai (Princeton), Arvind Krishnamurthy (Yale) |
|
Abstract: |
|
|
|
|
The phenomenon that
rural residents and people with low incomes lag behind in Internet
access is known as the ``digital divide.'' This problem is particularly
acute in developing countries, where most of the world's population
lives. Bridging this digital divide, especially by attempting to
increase the accessibility of broadband connectivity,can be
challenging. The improvement of wide-area connectivity is constrained
by factors such as how quickly we can dig ditches to bury
fibers in the ground; and the cost of furnishing "last-mile" wiring can
be prohibitively high.
In this paper, we explore the use of digital storage media transported
by the postal system as a general digital communication mechanism.
While some companies have used the postal system to deliver software
and movies, none of them has turned the postal system into a truly
generic digital communication medium supporting a wide variety of
applications. We call such a generic
system a Postmanet. Compared to traditional wide-area connectivity
options, the Postmanet has several important advantages, including wide
global reach, great bandwidth potential and low cost.
Manually preparing mobile storage devices for shipment may appear
deceptively simple, but with many applications, communicating parties
and messages, manual management becomes infeasible, and systems support
at several levels becomes necessary. We explore the simultaneous
exploitation of the Internet and the Postmanet, so we can combine their
latency and bandwidth advantages to enable sophisticated
bandwidth-intensive applications.
|
A System for Authenticated Policy-Compliant
Routing
|
Barath Raghavan (UCSD),
Alex C. Snoeren (UCSD) |
|
Abstract: |
|
|
|
|
Internet end users
and ISPs alike have little control over how packets are routed outside
of their own AS, restricting their ability to achieve levels of
performance, reliability, and utility that might
otherwise be attained. While researchers have proposed a number of
source-routing techniques to combat this limitation, there has thus far
been no way for independent ASes to ensure that such traffic does not
circumvent local traffic policies, nor to accurately determine the
correct party to charge for forwarding the traffic.
We present Platypus, an authenticated source routing system built
around the concept of network capabilities. Network capabilities
allow for accountable, fine-grained path selection by
cryptographically attesting to policy compliance at each hop along a
source route. Capabilities can be composed to construct routes
through multiple ASes and can be delegated to third parties. Platypus
caters to the needs of both end users and ISPs: users gain the ability
to pool their resources and select routes other than the default, while
ISPs maintain control over where, when, and whose packets traverse
their networks. We describe how Platypus can be used to address
several well-known issues in wide-area routing at both the edge and the
core, and evaluate its performance, security, and interactions with
existing protocols. Our results show that incremental deployment
of Platypus can achieve immediate gains.
|
SPV:
Secure Path Vector Routing for Securing BGP
|
Yih-Chun Hu (UC
Berkeley), Adrian Perrig (CMU), Marvin Sirbu (CMU) |
|
Abstract: |
|
|
|
|
As our economy and
critical infrastructure increasingly relies on the Internet, the
insecurity of the underlying border gateway routing protocol (BGP)
stands out as the Achilles heel. Recent misconfigurations and attacks
have demonstrated the brittleness of BGP. Securing BGP has become a
priority.
In this paper, we focus on a viable deployment path to secure BGP. We
analyze security requirements, and consider tradeoffs of mechanisms
that achieve the requirements. In particular, we study how to secure
BGP update messages against attacks. We design an efficient
cryptographic mechanism that relies only on symmetric cryptographic
primitives to guard an ASPATH from alteration, and propose the Secure
Path Vector (SPV) protocol. In contrast to the previously proposed
S-BGP protocol, SPV is around 22 times faster. With the current effort
to secure BGP, we anticipate that SPV will contribute several
alternative mechanisms to secure BGP, especially for the case of
incremental deployments.
|
Shield:
Vulnerability-Driven Network Filters for Preventing Known Vulnerability
Exploits
|
Helen J. Wang (Microsoft
Research), Chuanxiong Guo (Microsoft Research), Daniel R. Simon
(Microsoft Research), Alf Zugenmaier (Microsoft Research) |
|
Abstract: |
|
|
|
|
Software patching
has not been effective as a first-line defense against large-scale worm
attacks, even when patches have long been available for their
corresponding vulnerabilities. Generally, people have been
reluctant to patch their systems immediately, because patches are
perceived to be unreliable and disruptive to apply. To address
this problem, we propose a first-line worm defense in the network
stack, using shields ---
vulnerability-specific, exploit-generic network filters installed in
end systems once a vulnerability is discovered, but before a patch is
applied. These filters examine the incoming or outgoing traffic of
vulnerable applications, and drop traffic that exploits
vulnerabilities. Shields are less disruptive to install and uninstall,
easier to test for bad side effects, and hence more reliable than
traditional software patches. Further, shields are resilient to
polymorphic or metamorphic variations of exploits.
In this paper, we show that this concept is feasible by describing a
prototype Shield framework implementation that filters traffic above
the transport layer. We have designed a safe and restrictive language
to describe vulnerabilities as partial state machines of the vulnerable
application. The expressiveness of the language has been verified by
encoding the signatures of several known vulnerabilites.
In our Shield design, we model vulnerability signatures as a
combination of partial protocol state machines and
vulnerability-parsing instructions for specific payloads. For
generality, we abstract out the generic elements of application-level
protocols. For flexibility, we express vulnerability signatures and
their countermeasures in a safe and restrictive policy language,
interpreted by the Shield framework at runtime. We also minimize
Shield's maintenance of protocol state (for scalability), and apply
defensive design to ensure robustness. We have implemented a Shield
prototype and experimented with 10 known vulnerabilities, including the
ones behind the (in)famous MSBlast and Slammer worms. Our
evaluation provides evidence of Shield's low false positive rate and
small impact on application throughput. An examination of a
sample set of known vulnerabilities suggests that Shield could be used
to prevent exploitation of a substantial fraction of the most dangerous
ones.
|
Locating Internet Routing Instabilities
|
Anja Feldmann (TU
Muenchen), Olaf Maennel (TU Muenchen), Z. Morley Mao (U. Michigan),
Arthur Berger (MIT/Akamai), Bruce Maggs (CMU/Akamai) |
|
Abstract: |
|
|
|
|
This paper presents
a methodology for identifying the autonomous system (or systems)
responsible when a routing change is observed and propagated by
BGP. The origin of such a routing instability is deduced by
examining and correlating BGP updates for many prefixes
gathered at many observation points. Although interpreting BGP
updates can be perplexing, we find that we can pinpoint the origin to
either a single AS or a session between two ASes in most cases.
We verify our methodology in two phases. First, we perform
simulations on an AS topology derived from actual BGP updates using
routing policies that are compatible with inferred
peering/customer/provider relationships. In these simulations, in
which network and router behavior are ``ideal'', we inject inter-AS
link failures and demonstrate that our methodology can effectively
identify most origins of instability. We then develop several
heuristics to cope with the
limitations of the actual BGP update propagation process and monitoring
infrastructure, and apply our methodology and evaluation techniques to
actual BGP updates gathered at hundreds of observation points.
This approach of relying on data from BGP simulations as well as from
measurements enables us to evaluate the inference quality achieved by
our approach under ideal situations and how it is correlated with the
actual quality and the number of observation points.
|
Diagnosing Network-Wide Traffic Anomalies
|
Anukool Lakhina (Boston
University), Mark Crovella (Boston University), Christophe Diot (Intel
Research) |
|
Abstract: |
|
|
|
|
Anomalies are
unusual and significant changes in a network's traffic levels, which
can often span multiple links. Diagnosing anomalies is critical for
both network operators and end users. It is a difficult problem
because one must extract and interpret anomalous patterns from
large amounts of high-dimensional, noisy data.
In this paper we propose a general method to diagnose anomalies.
This method is based on a separation of the high-dimensional space
occupied by a set of network traffic measurements into disjoint
subspaces corresponding to normal and anomalous network
conditions. We show that this separation can be performed
effectively using Principal Component Analysis.
Using only simple traffic measurements from links, we study volume
anomalies and show that the method can: (1) accurately detect when a
volume anomaly is occurring; (2) correctly identify the underlying
origin-destination (OD) flow which is the source of the anomaly; and
(3) accurately estimate the amount of traffic involved in the anomalous
OD flow.
We evaluate the method's ability to diagnose (\emph{i.e.,} detect,
identify, and quantify) both existing and synthetically injected volume
anomalies in real traffic from two backbone networks. Our method
consistently diagnoses the largest volume anomalies, and does so with a
very low false alarm rate.
|
Network Sensitivity to Hot-Potato
Disruptions
|
Renata Teixeira (UCSD),
Aman Shaikh (AT&T Labs--Research), Tim Griffin (Intel Research),
Geoffrey Voelker (UCSD) |
|
Abstract: |
|
|
|
|
Hot-potato routing
is a mechanism employed when there are multiple (equally good)
interdomain routes available for a given destination. In this scenario,
the Border Gateway Protocol (BGP) selects the interdomain route
associated with the closest egress point based upon intradomain path
costs. Consequently, intradomain routing changes can impact interdomain
routing and cause abrupt swings of external routes, which we call
hot-potato disruptions. Recent work has shown that hot-potato
disruptions can have a substantial impact on large ISP backbones and
thereby jeopardize the network robustness. As a result, there is a need
for guidelines and tools to assist in the design of networks that
minimize hot-potato disruptions. However, developing these tools is
challenging due to the complex and subtle nature of the interactions
between exterior and interior routing. In this paper, we address these
challenges using an analytic model of hot-potato routing that
incorporates metrics to evaluate network sensitivity to hot-potato
disruptions. We then present a methodology for computing these
metrics using measurements of real ISP networks. We demonstrate the
utility of our model by analyzing the sensitivity of AT&T's
backbone network.
|
Building a Better NetFlow
|
Cristian Estan (UCSD),
Ken Keys (CAIDA), David Moore (CAIDA/UCSD), George Varghese (UCSD) |
|
Abstract: |
|
|
|
|
Network operators
need to determine the composition of the traffic mix on links when
looking for dominant applications, users, or estimating traffic
matrices. Cisco’s NetFlow has evolved into a solution that satisfies
this need by reporting flow records that summarize a sample of the
traffic traversing the link. But sampled NetFlow has shortcomings that
hinder the collection and analysis of traffic data. First, during
flooding attacks router memory and network bandwidth consumed by flow
records can increase beyond what is available; second, selecting the
right static sampling rate is difficult because no single rate gives
the right tradeoff of memory use versus accuracy for all traffic mixes;
third, the heuristics routers use to decide when a flow is reported are
a poor match to most applications that work with time bins; finally, it
is impossible to estimate without bias the number of active flows for
aggregates with non-TCP traffic.
In this paper we propose Adaptive NetFlow, deployable through an update
to router software, which addresses many shortcomings of NetFlow by
dynamically adapting the sampling rate to achieve robustness without
sacrificing accuracy. To enable counting of non-TCP flows, we propose
an optional Flow Counting Extension that requires augmenting existing
hardware at routers. Both our proposed solutions readily provide
descriptions of the traffic of progressively
smaller sizes. Transmitting these at progressively higher levels of
reliability allows graceful degradation of the accuracy of traffic
reports in response to network congestion on the reporting path.
|
Work-Convserving Distributed Schedulers for
Terabit Routers
|
Prashanth Pappu
(Washington University), Jonathan Turner (Washington University), Ken
Wong (Washington University) |
|
Abstract: |
|
|
|
|
Buffered multistage
interconnection networks offer one of the most scalable and
cost-effective approaches to building high capacity routers.
Unfortunately, the performance of such systems has been difficult to
predict in the presence of the extreme traffic conditions that can
arise in the Internet. Recent work introduced distributed scheduling,
to regulate the flow of traffic in such systems. This work
demonstrated, using simulation and experimental measurements, that
distributed scheduling can deliver robust performance for extreme
traffic. Here, we show that distributed schedulers can be provably
work-conserving for speedups of 2 or more. Two of the three schedulers
we describe were inspired by previously published crossbar schedulers.
The third has no direct counterpart in crossbar scheduling. In our
analysis, we show that distributed schedulers based on blocking flows
in small-depth acyclic flow graphs can be work-conserving, just as
certain crossbar schedulers based on maximal bipartite matchings have
been shown to be work-conserving. We also study the performance of
practical variants of these schedulers when the speedup is less than 2,
using simulation.
|
Exact GPS Simulation with Logarithmic Complexity,
and its Applications to an Optimally Fair Scheduler
|
Paolo Valente (U. Pisa) |
|
Abstract: |
|
|
|
|
Generalized
Processor Sharing (GPS) is a fluid scheduling policy providing perfect
fairness. The minimum deviation (lead/lag) with respect to the GPS
service achievable by a packet scheduler is one packet size. To the
best of our knowledge, the only packet scheduler guaranteeing such
minimum deviation is Worst-case Fair Weighted Fair Queueing (WF2Q),
that requires on-line GPS simulation. Existing algorithms to perform
GPS simulation have O(N) complexity per packet transmission (Nbeing the
number of competing flows). Hence WF2Q has been charged for O(N)
complexity too. Schedulers with lower complexity have been devised, but
at the price of at least O(N) deviation from the GPS service, which has
been shown to be detrimental for real-time adaptive applications and
feedback based applications. Furthermore, it has been proven that the
lower bound complexity to guarantee O(1) deviation is Omega(log N), yet
a scheduler achieving such result has remained elusive so far.
In this paper we present an algorithm that performs exact GPS
simulation with O(log N) worst-case complexity and small constants. As
such it improves the complexity of all the packet schedulers based on
GPS simulation. In particular, using our algorithm within WF2Q, we
achieve the minimum deviation from the GPS service with O(log N)
complexity, thus matching the aforementioned lower bound. Furthermore,
we assess the effectiveness of the proposed solution by simulating
real-world scenarios.
|
Sizing Router Buffers
|
Guido Appenzeller
(Stanford), Isaac Keslassy (Stanford), Nick McKeown (Stanford) |
|
Abstract: |
|
|
|
|
All Internet
routers contain buffers to hold packets during times of congestion.
Today, the size of the buffers is determined by the dynamics of TCP's
congestion control algorithm. In particular,
the goal is to make sure that when a link is congested, it is busy 100%
of the time; which is equivalent to making sure its buffer never goes
empty. A widely used rule-of-thumb states that each link needs a buffer
of size B = RTT X C, where RTT is the average round-trip time
of a flow passing across the link, and C is the data rate of the link. For
example, a 10Gb/s router linecard needs approximately 250ms X 10Gb/s =
2.5Gbits of buffers; and the amount of buffering grows linearly with
the line-rate. Such large buffers are challenging for router
manufacturers, who must use large, slow, off-chip DRAMs. And queueing
delays can be long, have high variance, and may destabilize the
congestion control algorithms.
In this paper we argue that the rule-of-thumb B = RTT X C is now outdated and
incorrect for backbone routers. This is because of the large number of
flows (TCP connections) multiplexed together on a single backbone link.
Using theory, simulation and experiments on a network of real routers,
we show that a link with n
flows requires no more than B =(RTT
X C) / sqrt{n}, for long-lived or short-lived TCP flows. The
consequences on router design are enormous: A 2.5Gb/s link carrying
10,000 flows could reduce its buffers by 99% with negligible difference
in throughput; and a 10Gb/s link carrying 50,000 flows requires only
10Mbits of buffering, which can easily be implemented using fast,
on-chip SRAM.
|
A Wavelet-Based Approach to Detect Shared
Congestion
|
Min Sik Kim (UT Austin),
Taekhyun Kim (UT Austin), YongJune Shin (UT Austin), Simon S. Lam (UT
Austin), Edward J. Powers (UT Austin) |
|
Abstract: |
|
|
|
|
Per-flow congestion
control helps endpoints fairly and efficiently share network
resources. Better utilization of network resources can be
achieved, however, if congestion management algorithms can determine
when two different flows share a congested link. Such knowledge
can be used to implement cooperative congestion control or improve the
overlay topology of a P2P system. Previous techniques to detect
shared congestion either assume a common source or destination node,
drop-tail queueing, or a single point of congestion. We propose
in this paper a novel technique, applicable to any pair of paths on the
Internet, without such limitations. Our technique employs a
signal processing method, wavelet denoising, to separate queueing delay
caused by network congestion from various other delay variations.
Our wavelet-based technique is evaluated through both simulations and
Internet experiments. We show that, when detecting shared
congestion of paths with a common endpoint, our technique provides
faster convergence and higher accuracy while using fewer packets than
previous techniques, and that it also accurately determines when there
is no shared congestion. Furthermore, we show that our technique
is robust and accurate for paths without a common endpoint or
synchronized clocks; more specifically, it can tolerate a
synchronization offset of up to one second between two packet flows.
|
Delayed Stability and Performance of
Distributed Congestion Control
|
Yueping Zhang (Texas
A&M), Seong-Ryong Kang (Texas A&M), Dmitri Loguinov (Texas
A&M) |
|
Abstract: |
|
|
|
|
Recent research
efforts to design better Internet transport protocols combined with
scalable Active Queue Management (AQM) have led to significant advances
in congestion control. One of the hottest topics in this area is the
design of discrete congestion control algorithms that are
asymptotically stable under heterogeneous feedback delay and whose
control equations do not explicitly depend on the RTTs of end-flows. In
this paper, we show that max-min fair congestion control methods with a
stable symmetric Jacobian remain stable under arbitrary feedback delay
(including heterogeneous directional delays) and that the stability
condition of such methods does not involve any of the delays. To
demonstrate the practicality of the obtained result, we
change the original controller in Kelly's work [14] to become robust
under random feedback delay and fixed constants of the control
equation. We call the resulting framework Max-min Kelly Control (MKC)
and show that it offers smooth sending rate, exponential convergence to
efficiency, and fast convergence to fairness, all of which make it
appealing for future high-speed networks.
|
Impact of Configuration Errors on
DNS Robustness
|
Vasileios Pappas (UCLA),
Zhiguo Xu (UCLA), Songwu Lu (UCLA), Daniel Massey (Colorado State),
Andreas Terzis (Johns Hopkins), Lixia Zhang (UCLA) |
|
Abstract: |
|
|
|
|
During the past
twenty years the Domain Name System (DNS) has sustained phenomenal
growth while maintaining satisfactory performance. However, the
original design focused mainly on system robustness against physical
failures, and neglected the impact of operational errors such as
misconfigurations. Our recent measurement effort revealed three
specific types of misconfigurations in DNS today: lame delegation,
diminished server redundancy, and cyclic zone dependency. Zones with
configuration errors suffer from reduced availability and increased
query delays up to an order of magnitude. Furthermore, while the
original DNS design assumed that
redundant DNS servers fail independently, our measurements show that
operational choices made at individual zones can severely affect the
availability of other zones. We found that, left unchecked, DNS
configuration errors are widespread, with lame delegation affecting 15%
of the DNS zones, diminished server redundancy being even more
prevalent, andcyclic dependency appearing in 2% of the zones. We also
noted that the degrees of misconfiguration vary from zone to zone, with
most popular zones having the lowest percentage of errors. Our results
indicate that DNS, as well as any other truly robust large-scale
system, must include systematic checking mechanisms to cope with
operational errors.
|
The Design and Implementation
of a Next Generation Name Service for the Internet
|
Venugopalan
Ramasubramanian (Cornell), Emin Gun Sirer (Cornell) |
|
Abstract: |
|
|
|
|
Name services are
critical for mapping logical resource names to physical resources in
large-scale distributed systems. The Domain Name System (DNS)
used on the Internet, however, is slow, vulnerable to denial of service
attacks, and does not support fast updates. These problems stem
fundamentally from the structure of the legacy DNS.
This paper describes the design and implementation of the Cooperative
Domain Name System (CoDoNS), a novel name service, which provides high
lookup performance through proactive caching, resilience to denial of
service attacks through automatic load-balancing, and fast propagation
of updates. CoDoNS derives its scalability, decentralization,
self-organization, and failure resilience from peer-to-peer overlays,
while it achieves high performance using the Beehive replication
framework. Cryptographic delegation, instead of host-based
physical delegation, limits potential malfeasance by namespace
operators and creates a competitive market for namespace
management. Backwards compatibility with existing protocols and
wire formats enables CoDoNS to serve as a backup for legacy DNS, as
well as a complete replacement. Performance measurements from a
real-life deployment of the system in PlanetLab shows that CoDoNS
provides fast lookups, automatically reconfigures around faults without
manual involvement and thwarts distributed denial of service attacks by
promptly redistributing load across nodes.
|
A Layered Naming Architecture for the
Internet
|
Hari Balakrishnan (MIT),
Karthik Lakshminarayanan (UC Berkeley), Sylvia Ratnasamy (Intel
Research), Scott Shenker (ICIR & UC Berkeley), Ion Stoica (UC
Berkeley), Michael Walfish (MIT) |
|
Abstract: |
|
|
|
|
Currently the
Internet has only one level of name resolution, DNS, which converts
user-level domain names into IP addresses. In this paper we
borrow liberally from the literature to argue that there should be
three levels of name resolution: from user-level descriptors to service
identifiers; from service identifiers to endpoint identifiers; and from
endpoint identifiers to IP addresses. These additional levels of naming
and resolution (1) allow services and data to be first class Internet
objects (in that they can be directly and persistently named),
(2) seamlessly
accommodate mobility and multi-homing and (3) integrate middleboxes
(such as NATs and firewalls) into the Internet architecture. We
further argue that flat names are a natural choice for the service and
endpoint identifiers. Hence, this architecture requires scalable
resolution of flat names, a capability that distributed hash tables
(DHTs) can provide.
|
Mercury:
Supporting Scalable Multi-Attribute Range Queries
|
Ashwin R. Bharambe (CMU),
Mukesh Agrawal (CMU), Srinivasan Seshan (CMU) |
|
Abstract: |
|
|
|
|
This paper presents
the design of Mercury, a scalable protocol for supporting
multi-attribute range-based searches. Mercury differs from previous
range-based query systems in that it supports multiple attributes as well as
performs explicit load balancing.
To guarantee efficient routing and load balancing, Mercury uses novel
light-weight sampling mechanisms for uniformly sampling random nodes in
a highly dynamic overlay network. Our evaluation shows that Mercury is
able to achieve its goals of logarithmic-hop routing and near-uniform
load balancing.
We also show that Mercury can be used to solve a key problem for an
important class of distributed applications: distributed state
maintenance for distributed games. We show that the Mercury-based
solution is easy to use, and that it reduces the game's messaging
overheard significantly compared to a naive approach.
|
Modeling and Performance Analysis
of Bit Torrent-Like Peer-to-Peer Networks
|
Dongyu Qiu (U. Illinois
at Urbana-Champaign), R. Srikant (U. Illinois at Urbana-Champaign) |
|
Abstract: |
|
|
|
|
In this paper, we
develop simple models to study the performance of BitTorrent, a second
generation peer-to-peer (P2P) application. We first present a simple
fluid model and study the scalability, performance and efficiency of
such a file-sharing mechanism. We then consider the built-in incentive
mechanism of BitTorrent and study its effect on network
performance. We also provide numerical results based on both
simulations and real traces obtained from the Internet.
|
A Scalable Distributed Information
Management System
|
Praveen Yalagandula (UT
Austin), Mike Dahlin (UT Austin) |
|
Abstract: |
|
|
|
|
We present a
Scalable Distributed Information Management System (SDIMS) that
aggregates information about large-scale networked systems and that can
serve as a basic building block for a broad range of large-scale
distributed applications by providing detailed views of nearby
information and summary views of global information. To serve as a
basic building block, a SDIMS should have four properties: scalability
to many nodes and attributes, flexibility to accommodate a broad range
of applications, administrative isolation for security and
availability, and robustness to node and network failures. We design,
implement and evaluate a SDIMS that (1) leverages Distributed Hash
Tables (DHT) to create scalable aggregation trees, (2) provides
flexibility through a simple API that lets applications control
propagation of reads and writes, (3) provides administrative isolation
through simple extensions to current DHT algorithms, and (4) achieves
robustness to node and network reconfigurations through lazy
reaggregation, on-demand reaggregation, and tunable spatial
replication. Through extensive simulations and micro-benchmark
experiments, we observe that our system is an order of magnitude more
scalable than existing approaches, achieves isolation properties at the
cost of modestly increased read latency in comparison to flat DHTs, and
gracefully handles failures.
|
|
|