ACM SIGCOMM '97
|Sunday, September 14, 1997|
|Monday, September 15, 1997|
|6:00 pm||8:00 pm||Welcome reception|
|Tuesday, September 16, 1997|
|9:00 am||10:15 am||Introduction|
|10:45 am||12:15||Fast Routers|
|1:45 pm||3:15 pm||Wireless Networks|
|3:45 pm||5:15 pm||Novel Architectures|
|5:30 pm||6:30 pm||ACM SIGCOMM internal session|
|7:00 pm||10:00 pm||Pool Session|
|Wednesday, September 17, 1997|
|9:00 am||10:00 am||Keynote|
|10:30 am||12:00||Performance Measurement|
|1:30 pm||3:00 pm||Web and TCP performance|
|3:30 pm||5:00 pm||Miscellaneous Topics|
|5:15 pm||6:45 pm||Active Networking panel|
|8:00 pm||11:00 pm||Conference dinner|
|Thursday, September 18, 1997|
|11:00 am||12:30 pm||Multicast|
|12:30 pm||1:00 pm||Closing Session|
Measurement-based Admission Control (MBAC) is an attractive mechanism to concurrently offer Quality of Service (QoS) to users, without requiring a-priori traffic specification and on-line policing. However, the measurement process is a source of error, as we have to rely on estimated traffic parameters.
We consider the impact of estimation errors on the performance of measurement-based admission control. We study a sequence of increasingly sophisticated models that take into account traffic correlation, memory in the estimation process, and flow arrival and departure dynamics. We validate our results through extensive simulation.
We show that a certainty equivalence assumption, i.e., assuming that the measured parameters are the real ones, can grossly compromise the target performance of the system. Furthermore, we can explicitly quantify the impact on performance of measurement errors. Our results yield valuable new insight into the performance of proposed MBAC schemes, and into the design of schemes that are robust to estimation errors.
Datagram services provide a simple, flexible, robust, and scalable communication abstraction; their usefulness has been well demonstrated by the success of IP, UDP, and RPC. Yet, the overwhelming majority of network security protocols that have been proposed are geared towards connection-oriented communications. The few that do cater to datagram communications tend to either rely on long term host-pair keying or impose a session-oriented (i.e., requiring connection setup) semantics.
Separately, the concept of flows has received a great deal of attention recently, especially in the context of routing and QoS. A flow characterizes a sequence of datagrams sharing some pre-defined attributes. In this paper, we advocate the use of flows as a basis for structuring secure datagram communications. We support this by proposing a novel protocol for datagram security based on flows. Our protocol achieves zero-message keying, thus preserving the connectionless nature of datagram, and makes use of soft state, thus providing the efficiency of session-oriented schemes. We have implemented an instantiation for IP in the 4.4BSD kernel, and we provide a description of our implementation along with performance results.
In this paper, we propose a scheduling algorithm that to the best of our knowledge is the first that can support simultaneously (a) hierarchical link-sharing service, (b) guaranteed real-time service with provable tight delay bounds, and (c) decoupled delay and bandwidth allocation (which subsumes priority scheduling). This is achieved by defining and incorporating fairness property, which is essential for link-sharing, into Service-Curve based schedulers, which can decouple the allocation of bandwidth and delay. We call the hierarchical version of the resulted algorithm a Hierarchical Fair Service Curve (H-FSC) Algorithm. We analyze the performance of H-FSC and present simulation results to demonstrate the advantages of H-FSC over previously proposed algorithms such as H-PFQ and CBQ. Preliminary experimental results based on a prototype implementation in NetBSD are also presented.
Recently there has been much interest in combining the speed of layer-2 switching with the features of layer-3 routing. This has been prompted by numerous proposals, including: IP Switching, Tag Switching, ARIS, CSR, and IP over ATM. In this paper, we study IP Switching and evaluate the performance claims made by Newman et al in  and . In particular, using nine network traces, we study how well IP Switching performs with traffic found in campus, corporate, and Internet Servicer Provide (ISP) environments. Our main finding is that IP Switching will lead to a high proportion of packets that are switched; over 75% in all of the environments we studied. We also investigate the effects that different flow classifiers and various timer values have on performance, and note that some choices can result in a large VC space requirement. Finally, we present recommendations for the flow classifier and timer values, as a function of the VC space of the switch and the network environment being served.
Active networks accelerate network evolution by permitting the network infrastructure to be programmable, on a per-user, per-packet, or other basis. This programmability must be balanced against the safety and security needs inherent in shared resources.
This paper describes the design, implementation, and performance of a new type of network element, an Active Bridge. The active bridge can be reprogrammed on the fly, with loadable modules called Switchlets. We incrementally extend what is initially a programmable buffered repeater with Switchlets into a self-learning bridge, a bridge supporting spanning tree algorithms, and a bridge supporting multiple spanning tree algorithms. To demonstrate the use of the active property, we implemented an algorithm to allow the coordinated shift from a DEC-like spanning tree protocol to our 802.1D implementation with provisions for automatic fall back if an error is detected in the 802.1D implementation. This shows that the Active Bridge can protect itself from some algorithmic failures in loadable modules.
Our approach to safety and security prefers static checking and prevention over dynamic checks when possible. We rely on strong type checking features of the Caml language for the loadable module infrastructure, and achieved respectable performance. The prototype implementation on a 4 processor Pentium-based HP Netserver LS running LINUX with 100 Mbps Ethernet LANS achieves ttcp throughput of 15 Mbps between two PCs running LINUX, compared with 45 Mbps unbridged.
We describe tcpanaly, a tool for automatically analyzing a TCP implementation's behavior by inspecting packet traces of the TCP's activity. Doing so requires surmounting a number of hurdles, including detecting packet filter measurement errors, coping with ambiguities due to the distance between the measurement point and the TCP, and accommodating a surprisingly large range of behavior among different TCP implementations. We discuss why our efforts to develop a fully general tool failed, and detail a number of significant differences among 8 major TCP implementations, some of which, if ubiquitous, would devastate Internet performance. The most problematic TCPs were all independently written, suggesting that correct TCP implementation is fraught with difficulty. Consequently, it behooves the Internet community to develop testing programs and reference implementations.
Byte stuffing is a process that transforms a sequence of data bytes that may contain 'illegal' or 'reserved' values into an equivalent (but potentially longer) sequence that contains no occurrences of those values. This potential increase in length is unavoidable overhead if we wish to eliminate byte values without loss of information, but the exact amount of overhead varies according to the algorithm being used. With current byte stuffing algorithms, such as used by SLIP [RFC1055], PPP [RFC1661] and AX.25 [ARRL84], the overhead incurred is extremely variable: some sequences incur no overhead at all while at the other extreme, some sequences double in size.
This variability of encoding overhead forces us to over-engineer networking hardware and software to ensure that we will be able to transmit all legal data packets, including the few that double in size when encoded. Devices like radio transmitters on portable computers have tight budgets for cost, weight, size, and power consumption, and over-engineering their transmission capabilities by a factor of two is not an acceptable option. The increasing popularity of these devices makes byte stuffing overhead a more important problem than it has been in the past.
This paper presents a byte stuffing algorithm that has a much tighter bound on its worst-case performance. The algorithm's overall performance is comparable with SLIP byte stuffing, but it has the advantage that it is guaranteed, even in the absolute worst case, to add an overhead of no more than one byte in 254 to any packet. For example, on a 1000 byte packet it adds at most 4 bytes, an overhead of 0.4%. A simple variant of this algorithm consistently beats SLIP's overall average performance while still guaranteeing a worst-case per-packet overhead of no more than one byte in 223 (about 0.45%).
In this paper, we demonstrate the effectiveness of Random Early Detection (RED) on three different traffic types: greedy, bursty and adaptive. We point out that RED contributes to unfair bandwidth sharing when a mixture of the three traffic types exist at the shared gateway. This unfairness is caused by the fact that RED proportionally drops packets from the competing connections according to their bandwidth usages. We propose Fair Random Early Drop (FRED), a modified version of RED. FRED detects incipient congestion by monitoring the average queue length and decides whether to drop a packet when the running average exceeds a threshold. While connections in RED experience the same packet loss rate, FRED makes different dropping decisions over connections with different bandwidth usages by use of per-active- connection accounting. We show that FRED provides better protections for bursty connections. In addition, FRED is able to isolate ill-behaved greedy traffic more effectively. These improvements are demonstrated by simulations of TCP and UDP traffic. FRED does not make any assumptions on the queueing architecture. The cost of FRED's per-active-connection accounting is only proportional to the total buffer size and is independent of the number of connections.
This paper proves the existence of and explicitly determines effective bandwidths for a class of non Markovian fluid sources, featuring multiple data-transmission rates and arbitrary distributions for the times these rates are sustained. The investigated models cover considerably more traffic profiles than the usual Markovian counterparts and have reduced state-space requirements. The effective bandwidth, as a function of the asymptotic loss probability decay rate, is implicitly derivable by the requirement that the spectral radius of an appropriate nonnegative matrix be equal to unity. The effective bandwidth function is shown to be, either strictly increasing, or constant and equal to the mean rate. Sources of the second kind, which are characterized, generalize the notion of CBR traffic. Furthermore, a study for the limiting effective bandwidth, towards a loss-less environment, is undertaken; it is shown that the limiting value may, under some fully identified restrictions on the source behavior, be less than the source's peak rate. Under those restrictions, a source may have reduced bandwidth requirements, even if it features a large peak rate.
We discuss findings from a large-scale study of Internet packet dynamics conducted by tracing 20,000 TCP bulk transfers between 36~Internet sites. Because we traced each 100 Kbyte transfer at both the sender and the receiver, the measurements allow us to distinguish between the end-to-end behaviors due to the different directions of the Internet paths, which often exhibit asymmetries. We characterize the prevalence of unusual network events such as out-of-order delivery, packet replication, and packet corruption; develop a robust receiver-based algorithm for estimating bottleneck bandwidth, which addresses deficiencies discovered in techniques based on packet pair; investigate patterns of packet loss, finding that loss events are not well-modeled as independent and, furthermore, that the distribution of the duration of loss events exhibits infinite variance; and analyze variations in packet transit delays as indicators of congestion periods, finding that congestion periods also span a wide range of time scales.
Fair scheduling of delay and rate-sensitive packet flows over a wireless channel is not addressed effectively by most contemporary wireline fair scheduling algorithms because of two unique characteristics of wireless media: (a) bursty channel errors, and (b) location-dependent channel capacity and errors. Besides, in packet cellular networks, the base station typically performs the task of packet scheduling for both downlink and uplink flows in a cell; however a base station has only a limited knowledge of the arrival processes of uplink flows.
In this paper, we propose a new model for wireless fair scheduling based on an adaptation of fluid fair queueing to handle location-dependent error bursts. We describe an ideal wireless fair scheduling algorithm which provides a packetized implementation of the fluid model while assuming full knowledge of the current channel conditions. For this algorithm, we derive the worst-case throughput and delay bounds. Finally, we describe a practical wireless scheduling protocol which closely approximates the ideal algorithm, addresses several implementation-related wireless medium access issues, and uses simple one-step channel prediction. Through a large number of simulations, we show that the protocol achieves the desirable properties identified in the wireless fluid fair queueing model.
For many applications it is important to provide communication services with guaranteed timeliness and fault-tolerance at an acceptable level of overhead or cost. We present a scheme for restoring real-time channels, each with guaranteed timeliness, from component failures in multi-hop networks. To ensure successful recovery and avoid the need of time-consuming channel re-establishment, backup channels are set up a priori in addition to each primary channel. That is, a dependable real-time connection consists of a primary channel and one or more backup channels. In case a primary channel fails, one of its backup channels is activated to become a new primary channel. We propose an efficient protocol which provides an integrated solution (i.e., channel switching, resource re-allocation, and so on) to the failure-recovery problem. We also develop a resource sharing method that significantly reduces the overhead of backup channels. The simulation results show that good coverage (in recovering from failures) can be achieved with 10--30% degradation in network utilization under a reasonable failure condition. Moreover, the fault-tolerance level of each dependable connection can be controlled to reflect its criticality, which is one of the unique features of the proposed scheme.
A number of analyses     of HTTP/1.0 and proposals influenced HTTP/1.1's design.
Major goals HTTP/1.1 included:
We describe our investigation of the effect of both persistent connections and pipelining improvements of both our client and server HTTP implementations, using a simple test setup, both to verify HTTP1.1's design and understand HTTP/1.1 implementation strategies. We present TCP and real time performance data between the Jigsaw server using HTTP/1.1 and the libwww robot using HTTP/1.0, HTTP/1.1 with persistent connections, and HTTP/1.1 with pipelined requests. We also investigate whether the TCP Nagle algorithm has an effect on HTTP/1.1 performance. While somewhat artificial and possibly overstating the benefits of HTTP/1.1, we believe the tests and results approximate some common behavior seen in browsers. The results confirm that HTTP/1.1 is meeting its major design goals.
For our tests, HTTP/1.1 implemented with pipelining outperformed HTTP/1.0, even when the HTTP/1.0 implementation used multiple connections in parallel, under all network environments tested. The savings is typically at least a factor of two, and often much more, in terms of packets transmitted. Elapsed time improvement is less dramatic, but significant. All HTTP/1.1 tests using pipelining and a single connection out performed HTTP/1.0 using multiple connections.
This paper examines the backbone inter-domain routing information exchanged between network service providers at the major U.S. public Internet exchange points. Internet routing instability, or the rapid fluctuation of network reachability information, is one of the more critical problems currently facing the Internet engineering community. High levels of network instability can lead to packet loss and increased network latency and time to convergence. At the extreme, high levels of routing instability have lead to the loss of internal connectivity in wide-area, national networks. In this paper, we describe several unexpected trends in routing instability, and examine a number of anomalies and pathologies observed in the exchange of inter-domain routing information. This paper provides analysis of all BGP routing packets exchanged between border routers at the Internet core's public exchange points during a nine month period. We show that the volume of these routing updates is several orders of magnitude more than expected and that the majority of this routing information is pathological. Furthermore, our analysis reveals several unexpected trends and ill-behaved systematic properties in Internet routing. We finally posit a number of explanations for these anomalies and evaluate their potential impact on the Internet infrastructure.
As multicast applications are deployed for mainstream use, the need to secure multicast communications will become critical. Multicast, however, does not fit the point-to-point model of most network security protocols which were designed with unicast communications in mind. As we will show, securing multicast (or group) communications is fundamentally different from securing unicast (or paired) communications. In turn, these differences can result in scalability problems for many typical applications. In this paper, we examine and model the differences between unicast and multicast security and then propose Iolus: a novel framework for scalable secure multicasting. Protocols based on Iolus can be used to achieve a variety of security objectives and may be used either to directly secure multicast communications or to provide a separate group key management service to other "security-aware" applications. We describe the architecture and operation of Iolus in detail and also describe our experience with a protocol based on the Iolus framework.
IP Multicast, Lightweight Sessions and Application Level Framing provide guidelines by which multimedia conferencing tools can be designed, but they do not provide specific solutions. In this paper, we use these design principles to guide the design of a multicast based shared editor, and examine the consequences of taking a loose consistency approach to achieve good performance in the face of network failures and losses.
In this paper we investigate how FEC (Forward Error Recovery) and ARQ (Automatic Repeat Request) can be used in combination to achieve scalable reliable multicast transmission. We consider the two scenarios where FEC is introduced as a transparent sublayer underneath a reliable multicast layer that uses ARQ, and where FEC and ARQ are both integrated into a single layer that uses retransmission of parity data to recover from the loss of original data packets.
To evaluate the performance improvements due to FEC, we consider different types of loss behaviors (spatially or temporally correlated loss, homogeneous or heterogeneous loss) and loss rates for up to 10^6 receivers. Our results show that augmenting ARQ with FEC greatly improves multicast transmission efficiency and scalability.
Caching in the World Wide Web currently follows a naive model, which assumes that resources are referenced many times between changes. The model also provides no way to update a cache entry if a resource does change, except by transferring the resource's entire new value. Several previous papers have proposed updating cache entries by transferring only the differences, or delta, between the cached entry and the current value.
In this paper, we make use of dynamic traces of the full contents of HTTP messages to quantify the potential benefits of delta-encoded responses. We show that delta-encoding can provide remarkable improvements in response-size and response-delay for an important set of HTTP content types. We also show the added benefit of data compression, and that the combination of delta-encoding and data compression yields the best results.
We propose specific extensions to the HTTP protocol for delta-encoding and data compression. These extensions are compatible with existing implementations and specifications, yet allow efficient use of a variety of encoding techniques.
Internet address lookup is a challenging problem because of increasing routing table sizes, increased traffic, higher speed links, and the migration to 128 bit IPv6 addresses. IP routing lookup requires computing the best matching prefix, for which standard solutions like hashing were believed to be inapplicable. The best existing solution we know of, BSD radix tries, scales badly as IP moves to 128 bit addresses. Our paper describes a new algorithm for best matching prefix using binary search on hash tables organized by prefix lengths. Our scheme scales very well as address and routing table sizes increase: independent of the table size, it requires a worst case time of log(address bits) hash lookups; thus only 5 hash lookups are needed for IPv4 and 7 for IPv6. We also introduce Mutating Binary Search and other optimizations that, for a typical IPv4 backbone router with over 33,000 entries, considerably reduce the average number of hashes to less than 2, of which one hash can be simplified to an indexed array access. We expect similar average case behavior for IPv6.
We consider the problem of supporting multipoint communication at the media access control (MAC) layer of broadcast-and-select WDM networks. We first show that bandwidth consumption and channel utilization arise as two conflicting objectives in the design of scheduling algorithms for multicast traffic in this environment.
We then present a new technique for the transmission of multicast packets, based on the concept of a virtual receiver, a set of physical receivers which behave identically in terms of tuning.
We also show that the number k of virtual receivers naturally captures the tradeoff between channel utilization and bandwidth consumption. Our approach decouples the problem of determining the virtual receivers from the problem of scheduling packet transmissions, making it possible to employ existing scheduling algorithms that have been shown to successfully hide the effects of tuning latency. Consequently, we focus on the problem of optimally selecting the virtual receivers, and we prove that it is NP-complete.
Finally, we present four heuristics of varying degrees of complexity for obtaining virtual receivers that provide a good balance between the two conflicting objectives.
We investigate a novel multicast technique, called Skyscraper Broadcasting (SB), for video-on-demand applications. We discuss the data fragmentation technique, the broadcasting strategy, and the client design. We also show the correctness of our technique, and derive mathematical equations to analyze its storage requirement. To assess its performance, we compare it to the latest designs known as Pyramid Broadcasting (PB) and Permutation-Based Pyramid Broadcasting (PPB). Our study indicates that PB offers excellent access latency. However, it requires very large storage space and disk bandwidth at the receiving end. PPB is able to address these problems. However, this is accomplished at the expense of a larger access latency and more complex synchronization. With SB, we are able to achieve the low latency of PB while using only 20% of the buffer space required by PPB.
For some time, the Internet community has believed that it is impossible to do IP routing lookups in software fast enough to support gigabit speeds. IP routing lookups must find the routing entry with the longest matching prefix, a task that has been thought to require hardware support at lookup frequencies of millions per second. We present a forwarding table data structure designed for quick routing lookups. Forwarding tables are small enough to fit in the cache of a conventional general purpose processor. With the table in cache, a 200 MHz Pentium Pro or a 333 MHz Alpha 21164 can perform a few million lookups per second. This means that it is feasible to do a full routing lookup for each IP packet at gigabit speeds without special hardware. The forwarding tables are very small, a large routing table with 40,000 routing entries can be compacted to a forwarding table of 150--160 Kbytes. A lookup typically requires less than 100 instructions on an Alpha, using eight memory references accessing a total of 14 bytes.
A new channel access discipline called floor acquisition multiple access (FAMA) is introduced. According to FAMA, control of the channel (the floor) is assigned to at most one station in the network at any given time, and this station is guaranteed to be able to transmit one or more data packets to different destinations with no collisions. We describe FAMA protocols consisting of non-persistent carrier or packet sensing, plus a collision-avoidance dialogue between a source and the intended receiver of a packet. Sufficient conditions under which these protocols provide correct floor acquisition are presented and verified for networks with hidden terminals. It is shown that FAMA protocols must use carrier sensing to support correct floor acquisition in the presence of hidden terminals. The throughput of FAMA protocols is analyzed for single-channel networks with hidden terminals; the analysis shows that carrier-sensing FAMA protocols perform better than ALOHA and CSMA protocols in the presence of hidden terminals.
Subjecting a mobile computing system to wireless network conditions that are realistic yet reproducible is a challenging problem. In this paper, we describe a technique called trace modulation that recreates the observed end-to-end characteristics of a real wireless network in a controlled and repeatable manner. Key attributes of our methodology are that it is transparent to applications and accounts for all network traffic sent or received by the system under test. We present the results of experiments to show that it is indeed capable of reproducing wireless network performance accurately.