USF Libraries
USF Digital Collections

A Web100-Dummynet testbed for education and research in transport layer protocols

MISSING IMAGE

Material Information

Title:
A Web100-Dummynet testbed for education and research in transport layer protocols
Physical Description:
Book
Language:
English
Creator:
Bassi, Steven
Publisher:
University of South Florida
Place of Publication:
Tampa, Fla
Publication Date:

Subjects

Subjects / Keywords:
TCP
Networks
Linux
Performance
Computers
Dissertations, Academic -- Computer Science -- Masters -- USF
Genre:
bibliography   ( marcgt )
theses   ( marcgt )
non-fiction   ( marcgt )

Notes

Abstract:
ABSTRACT: With recent developments in technology broadening the complexity and performance issues of computer networks, more work is being put into studying the behavior of network protocols under various new environments. As it stands today, TCP is the dominant protocol for communicating over the Internet. It has been shown to be reliable and efficient in the relatively low-bandwidth networks it was designed for. With the increasing availability of high-speed optical, wireless, and ad hoc networks, more research is being done to study new TCP variants that will perform well under these new environments. In order to perform such work, researchers require tools to test and observe the properties of the protocols they are studying. This is the case of Web100 1, 2, a software patch that provides the instrumentation and tools to study and diagnose TCP variables, and Dummynet 3, a tool for shaping network traffic that can be used in conjunction with Web100 to emulate different environments and network conditions.Although these two powerful tools have gained a lot of attention lately, installing and setting up this testbed is not a straightforward task. In this thesis, this is addressed by providing a guide to help researchers and educators reduce the installation time. In addition, the educational benefits of the testbed are shown by conducting a number of experiments under different network conditions and analyzing the behavior of the major mechanisms in TCP. Research potential is demonstrated by integrating the experimental SF-SACK 4, 5 protocol into the testbed and its performance is compared against SACK 6, another more widely used version of TCP. Using the Web100-Dummynet testbed, results are shown illustrating performance measurements such as throughput, fairness, and smoothness.
Thesis:
Thesis (M.S.)--University of South Florida, 2005.
Bibliography:
Includes bibliographical references.
System Details:
System requirements: World Wide Web browser and PDF reader.
System Details:
Mode of access: World Wide Web.
Statement of Responsibility:
by Steven Bassi.
General Note:
Title from PDF of title page.
General Note:
Document formatted into pages; contains 71 pages.

Record Information

Source Institution:
University of South Florida Library
Holding Location:
University of South Florida
Rights Management:
All applicable rights reserved by the source institution and holding location.
Resource Identifier:
aleph - 001912425
oclc - 173843311
usfldc doi - E14-SFE0001282
usfldc handle - e14.1282
System ID:
SFS0025603:00001


This item is only available as the following downloads:


Full Text
xml version 1.0 encoding UTF-8 standalone no
record xmlns http:www.loc.govMARC21slim xmlns:xsi http:www.w3.org2001XMLSchema-instance xsi:schemaLocation http:www.loc.govstandardsmarcxmlschemaMARC21slim.xsd
leader nam Ka
controlfield tag 001 001912425
003 fts
005 20071009094929.0
006 m||||e|||d||||||||
007 cr mnu|||uuuuu
008 071009s2005 flu sbm 000 0 eng d
datafield ind1 8 ind2 024
subfield code a E14-SFE0001282
040
FHM
c FHM
035
(OCoLC)173843311
049
FHMM
090
QA76 (ONLINE)
1 100
Bassi, Steven.
2 245
A Web100-Dummynet testbed for education and research in transport layer protocols
h [electronic resource] /
by Steven Bassi.
260
[Tampa, Fla] :
b University of South Florida,
2005.
3 520
ABSTRACT: With recent developments in technology broadening the complexity and performance issues of computer networks, more work is being put into studying the behavior of network protocols under various new environments. As it stands today, TCP is the dominant protocol for communicating over the Internet. It has been shown to be reliable and efficient in the relatively low-bandwidth networks it was designed for. With the increasing availability of high-speed optical, wireless, and ad hoc networks, more research is being done to study new TCP variants that will perform well under these new environments. In order to perform such work, researchers require tools to test and observe the properties of the protocols they are studying. This is the case of Web100 [1, 2], a software patch that provides the instrumentation and tools to study and diagnose TCP variables, and Dummynet [3], a tool for shaping network traffic that can be used in conjunction with Web100 to emulate different environments and network conditions.Although these two powerful tools have gained a lot of attention lately, installing and setting up this testbed is not a straightforward task. In this thesis, this is addressed by providing a guide to help researchers and educators reduce the installation time. In addition, the educational benefits of the testbed are shown by conducting a number of experiments under different network conditions and analyzing the behavior of the major mechanisms in TCP. Research potential is demonstrated by integrating the experimental SF-SACK [4, 5] protocol into the testbed and its performance is compared against SACK [6], another more widely used version of TCP. Using the Web100-Dummynet testbed, results are shown illustrating performance measurements such as throughput, fairness, and smoothness.
502
Thesis (M.S.)--University of South Florida, 2005.
504
Includes bibliographical references.
516
Text (Electronic dissertation or thesis) in PDF format.
538
System requirements: World Wide Web browser and PDF reader.
Mode of access: World Wide Web.
500
Title from PDF of title page.
Document formatted into pages; contains 71 pages.
590
Adviser: Miguel A. Labrador, Ph.D.
653
TCP.
Networks.
Linux.
Performance.
Computers.
690
Dissertations, Academic
z USF
x Computer Science
Masters.
773
t USF Electronic Theses and Dissertations.
4 0 856
u http://digital.lib.usf.edu/?e14.1282



PAGE 1

A Web100-Dummynet Testbed For Education And Research In Transport Layer Protocols by Steven Bassi A thesis submitted in partial fulfillment of the requirements for the degree of Master of Science in Computer Science Department of Computer Science and Engineering College of Engineering University of South Florida Major Professor: Miguel A. Labrador, Ph.D. Kenneth Christensen, Ph.D. Dewey Rundus, Ph.D. Date of Approval: July 7, 2005 Keywords: tcp, networks, linux, performance, computers Copyright 2005, Steven Bassi

PAGE 2

i Table of Contents List of Tables iii List of Figures iv Abstract vi Chapter 1 Introduction 1 Introduction and Motivations 1 Contribution of the Thesis 2 Outline of the Thesis 3 Chapter 2 Literature Review 4 The Transmission Control Protocol (TCP) 4 The Linux and FreeBSD Operating Systems 8 Network Simulation and Testbeds 9 Web100 and Dummynet 12 Chapter 3 Setting up a We b100-Dummynet Testbed 14 Testbed Topology and Specifications 14 Configuring a Web100 Client 16 Configuring a Dummynet Router 21 Chapter 4 Education and Research in Transport Layer Protocols (TLP) 26 Educational Experiments and Results 26 Analysis of a Single TCP Flow under Various Network Conditions 26 Different Buffer Sizes 27 Different Bottleneck Link Speeds 29 Different Packet Loss Ratios 31 Broadband Connection across an Overseas Link 32 Research Experiments and Results 36 The SF-SACK Protocol 36 How to Incorporate a New Protocol in to Web100: The SF-SACK Case 36 Comparison of TCP SACK and TCP SF-SACK 43 Fairness and Smoothness of Competing Flows 43 Different Queuing Strategies 48 Weighted Bandwidth Queue 50

PAGE 3

ii Chapter 5 Conclusions 52 References 53 Bibliography 56 Appendices 57 Appendix A: Source Code for Implem entation of SF-SACK in the Linux 2.4.26 Kernel 58

PAGE 4

iii List of Tables Table 3.1 Testbed Hardware Specifications 15 Table 4.1 Mean Throughput and Standa rd Deviation of Competing SACK and SF-SACK Connections 48

PAGE 5

iv List of Figures Figure 2.1 TCP Connection Establishment through Exchange of SYN and ACK 5 Figure 2.2 Behavior of TCP Congestion Window a nd Slow-Start Threshold 7 Figure 3.1 Network Layout of Testbed 15 Figure 3.2 Screenshot of the Web100 Userland 19 Figure 3.3 Trace Daemon Monitoring an FTP Connection 20 Figure 3.4 Excerpt from Trace Daemon C onf File 20 Figure 4.1 15Mbps with 5 Packet Buffer 28 Figure 4.2 15Mbps with 20 Packet Buffer 28 Figure 4.3 15Mbps with 75 Packet Buffer 28 Figure 4.4 15Mbps Bottleneck 30 Figure 4.5 60Mbps Bottleneck 30 Figure 4.6 Instantaneous and Average Throughput at 15Mbps, PLR .005 31 Figure 4.7 Instantaneous and Average Throughput at 15Mbps, PLR .01 31 Figure 4.8 Instantaneous and Averag e Throughput at 15Mbps, PLR .10 32 Figure 4.9 Throughput of 5Mbps Upst ream, 384Kbps Downstream, 240ms RTT 34 Figure 4.10 Avg. Throughput of 5Mbps Upstream, 384Kbps Downstream, 240ms RTT 34 Figure 4.11 Congestion Window of 5Mbps Upstream, 384Kbps Downstream, 240ms RTT 35

PAGE 6

v Figure 4.12 RTT and RTO of 5Mbps Upstream, 384Kbps Downstream, 240ms RTT 35 Figure 4.13 SACK vs. SACK15Mbp s, 20ms RTT ns2 Simulation 45 Figure 4.14 SF-SACK vs. SF-SACK15M bps, 20ms RTT ns2 Simulation 45 Figure 4.15 SACK vs. SACK 15Mbp s, 20ms RTT 46 Figure 4.16 SF-SACK vs. SF-SACK 15Mbps, 20ms RTT 46 Figure 4.17 SACK vs. SF-SACK 15Mbps, 20ms RTT 46 Figure 4.18 Congestion Window of SACK vs. SACK 47 Figure 4.19 Congestion Window of SACK vs. SF-SACK 47 Figure 4.20 Congestion Window of SF-SACK vs. SF-SACK 47 Figure 4.21 SF-SACK vs. SF-SACK, DropTail Queue 49 Figure 4.22 SF-SACK vs. SF-SACK, RED Queue 49 Figure 4.23 Throughput of SF-SACK vs. SF-SACK, 40:20 Weighted Queue 50 Figure 4.24 Avg. Throughput of SF-SACK vs. SF-SACK, 40:20 Weighted Queue 51

PAGE 7

vi A Web100-Dummynet Testbed for Educat ion and Research in Transport Layer Protocols Steven Bassi ABSTRACT With recent developments in technology broadening the complexity and performance issues of computer networks, more work is being put into studying the behavior of network protocol s under various new environments. As it stands today, TCP is the dominant protocol for co mmunicating over the Internet. It has been shown to be reliable and efficient in the relatively low-ba ndwidth networks it was designed for. With the increasing availability of high-speed optical, wireless, and ad hoc networks, more research is being done to study new TCP vari ants that will perfor m well under these new environments. In order to perform such wo rk, researchers requir e tools to test and observe the properties of the protocols they are studying. This is the case of Web100 [1, 2], a software patch that pr ovides the instrumentation a nd tools to study and diagnose TCP variables, and Dummynet [3], a tool for sh aping network traffic that can be used in conjunction with Web100 to emulate differe nt environments and network conditions. Although these two powerful tools have gained a lot of attenti on lately, installing and setting up this testbed is not a straightforw ard task. In this thes is, this is addressed by providing a guide to help researchers and educators reduce the installation time. In addition, the educational benefits of the testbed are shown by conducting a number of

PAGE 8

vii experiments under different netw ork conditions and analyzing the behavior of the major mechanisms in TCP. Research potential is demonstrated by integrating the experimental SF-SACK [4, 5] protocol into the testbed a nd its performance is co mpared against SACK [6], another more widely used version of TCP. Using the Web100-Dummynet testbed, results are shown illustrating performance measurements such as throughput, fairness, and smoothness.

PAGE 9

1 Chapter 1 Introduction Introduction and Motivations Transport layer protocols are a fundamental part of layered network architectures. Residing between the applicati on and network layers, the tran sport layer provides logical communication between applications and tran sparency that makes it seem like two end hosts are directly connected, when in r eality there may be tens or hundreds of intermediate network devices between them. The Internet relies on two transport layer protocols, the transmission control protocol (T CP) and the user datagram protocol (UDP), which provide different sets of transport layer services. UDP is a connectionless protocol used fo r services that require fast, best-effort data delivery. Such applications include streaming audio/video, domain name system (DNS) lookups, or online gaming, where it is acceptable if some packet loss occurs as long as the delivery rate is fast. On the other hand, TCP provide s connection-oriented services and reliable data delivery. TCP uses a number of mechanisms to ensure that data from a sending application is received comple te and in order at the receiving host. TCP also includes features that allow it to adapt to less than ideal network conditions. It automatically limits the rate of data being sent whenever packet loss is detected. It uses packet loss as an indicat ion that congestion exis ts in the network and assumes that reducing the size of the size of its congesti on window is the appropriate

PAGE 10

2 action to take. This has proven to work ve ry well in lower bandwidth networks where speeds will not reach tens or hundreds of mega bits per second, but has become a problem in recent years. In optical or high band width networks where TCPs congestion control mechanisms cause it to not utilize the am ount of bandwidth that may actually be available, end users may see a reduction in pe rformance and incorrectly attribute it to a flawed or congested network. Users cannot se e the inner workings of TCP and therefore cannot diagnose the problem correctly. Simila r problems exist in wireless and ad hoc networks and much research has gone in to solving TC Ps performance issues under these specific circumstances [7 14]. Most of the research done in transport la yer protocols involves the use of network simulators. NS2 [15] is a very popular open source simulator that allows researchers to experiment with modifications made to TCP and test them under various network conditions. Users can simulate such th ings as wireless links, high bandwidth connections, various packet loss rates, buffe r sizes, queuing strategies and so on. Other simulators exist and provide similar services to researchers and educators; however tools for studying real-life implementations are not so readily available. Contributions of the Thesis This thesis includes the following contri butions to research and education in transport layer protocols: A guide to setting up a Web100-Dummynet te stbed to study the behavior of the underlying mechanisms in TCP.

PAGE 11

3 An introduction to modifying TCP in the Linux kernel as shown in the TCP SFSACK case. Results from experiments demonstrating how this testbed can be used to study the behavior of TCP as well as comparing the performance of experimental TCP versions. Outline of the Thesis The rest of this thesis is organized as follows. Chapter 2 provides a literature review of TCP as well as the Linux and FreeB SD operating systems. Also included in this chapter are introductions network simulators a nd testbeds as well as Web100 and Dummynet. Chapter 3 provides specificat ions for the Web100-Dummynet testbed as well as a guide for installing and configuring the tools. In Chapter 4 the results from educational and research orie nted experiments are given. A description of the SF-SACK protocol and information on its implementati on in the Linux kernel is also provided. Conclusions are provided in Chapter 5 along with ideas for future work in this area.

PAGE 12

4 Chapter 2 Literature Review The Transmission Control Protocol (TCP) Reliable, connection oriented data deliv ery is fundamental to many internet applications. However, obstacles such as network congestion and packet loss can make it challenging for data to be delivered in this manner. Through its use of flow control, sequence numbers, acknowledgements and timers, TCP addresses the problem of reliably delivering data over IPs generally unreliable service. Additiona lly, congestion control mechanisms in TCP attempt to fairly and efficiently utilize the available end-to-end bandwidth. TCP is a connection oriented protocol b ecause it relies on the end hosts agreeing on the establishment of a communication channel be fore data transfer is to begin. It uses a three-way handshake which involves the transmission of SYN and ACK packets to ensure that both hosts are able and willing to begin data transfer. Figure 2.1 illustrates the exchange of these packets to establish the TCP connection. There are some properties of a TCP connec tion which should be discussed. First, the connection state resides only in the two e nd systems. The intermediate networking devices, including routers and switches, do not need to use TCP and in fact do not care what transport layer protocol the end users are using. Secondly, the established

PAGE 13

connection supports full-duplex transmissions. This means that the end hosts can send and receive application-laye r data simultaneously. Receive SYN + ACK Send SYN seq=x Host A Host B Receive SYN Send SYN seq=y, ACK x+1 Send ACK y+1 Receive ACK Time Figure 2.1 Establishment of a TCP Connection through Exchange of SYN and ACK. Once a connection is established, TCPs flow and congestion control mechanisms handle the transfer of data be tween the end hosts. The sending host places TCP segments on the output buffer where they can be passed down to the network layer and encapsulated in IP datagrams and routed through the network. The segments are eventually placed on the TCP receive buffer of the receiving host and sent up the network stack to be processed by the application layer. Problems might occur if the output buffer is sending data at a fast er rate than can be processed by the receive buffer or if data is being sent too slowly. The former case can lead to overflows on the receiving e nd, while the latter w ould under utilize the available bandwidth of the connection. This is where TCP implements flow control as a way of limiting the data rate of the sender to a speed that the receiver can process. 5

PAGE 14

6 Flow control is implemented by the sender through the use of a variable called the receive window. The receiving host keeps track of how much space is available in its receive buffer and passes that info rmation back to the sender in the RcvWindow field of every segment it sends out. The sender uses this information to control how much data is sent out of its send buffer. It should be noted that if the receivers buffer becomes full the sender will continue to send segments with one data byte until an acknowledgment is returned with a nonzero RcvWindow value. Congestion control is a mechanism used by TCP when the protocol thinks that there may be packet loss due to too much traffic on the network. If excessive traffic causes buffers in the intermediate routers to overflow, IP datagrams will be dropped. This will result in the sender perceiving congestion in the ne twork due to three duplicate acknowledgments or a timeout event. The sender will then limit the size of its congestion window variable cwnd to reduce the number of segments being sent. The algorithm used by TCP in response to this perceived network congestion is called the TCP congestion control algorithm. This algorithm features an additiveincrease, multiplicative-decrease (AIMD) of the cwnd variable. It also includes a slowstart phase which is used to exponentially increase the size of the congestion window when a TCP connection begins. This is meant to quickly make use of all of the available bandwidth of the connection. Once packet loss is detected or the congestion window reaches the value of the senders slow-start threshold variable (ssthresh) the algorithm switches to congestion avoi dance phase and the congest ion window is incremented linearly rather than exponentially. When packet loss is de tected due to three duplicate acknowledgments, the congestion window is redu ced by half. If packet loss is due to a

PAGE 15

timeout, the congestion window is set to 1. In either case, ssthresh is set to half of the previous cwnd value. This behavior is illustrated in Figure 2.2 0 2 4 6 8 10 12 0 5 10 15 20 Congestion Window (Segments) Transmission Round cwnd ssthresh Figure 2.2 Behavior of TCP congestion window and slow-start threshold. A timeout event occurs at transmission round 8, while a triple-dupack event is detected at round 16. This congestion control algorithm was m odified in the newer and more widely used TCP Reno and TCP New Reno and the Fa st Retransmit/Fast Recovery mechanisms were included. These new features set the ssthresh value to max(cwnd/2, 2) and the congestion window to ssthresh+3 whenever packet loss is detected, bypassing the slowstart phase and entering conge stion avoidance immediately. This provides for better performance and faster recovery from packet loss, as the name implies. 7

PAGE 16

8 The Linux and FreeBSD Operating Systems Linux and FreeBSD are both free, open source Unix-like operating systems originally developed to run on processors compatible with the Intel x86 architecture. Being open source and free offers great a great advantage to researchers in that they are given the flexibility to modify and configure the kernels to suit their needs. However, several differences exist between the two operating systems. Linux was originally developed in 19 91 by Linus Torvalds as a hobby while he was studying as a student at the University of Helsinki. The kernel is written in the C programming language (with some lines of assembly code) and can be compiled using the GNU GCC compiler. In its early stages the kernel was supported by groups of enthusiasts who continued to develop lib raries and tools and expand the project throughout the 1990s. Actually, the term Linux only really refers to the kernel, with all of the additional libraries and tools being p ackaged together in se parate distributions. Linux is licensed under the GNU General Public License [16] and so all source code modifications and all derived works must also be licensed under the GPL. So while users are permitted to view and modify the source code as they please, all modifications should remain open and available to the public. As the kernel continued to evolve, it was ported over to different architectures. In addition to personal comput ers, it is now regularly used in supercomputer architectures (64-bit AM D Opteron, Cray XD1), embedded systems, handheld devices, and commercial devices such as routers and pe rsonal video recorders. FreeBSD is a descendent of the original AT&T UNIX via the Berkeley Software Distribution (BSD) [17] In contrast to Linux, FreeBS D is released as a complete operating system with all of the userland utilit ies (shell, device drivers, etc.) packaged

PAGE 17

9 together. This means that multiple distri butions packaged with different tools and software are not available as they are for Linux. Many critical tec hnologies in Linux are derived from FreeBSD. FreeBSD is licensed under the BSD License [18] which permits anyone to use, redistribute, and modify the system as long as the copyright notice and BSD license itself is not removed. FreeBSD is used in a variety of commercial and open source products including Apples Mac OS X and J uniper routers [17]. Network Simulation and Testbeds The use of tools to study transport layer protocols is widespread among researchers and educators who wish to analyze behavior under varying network conditions. A simulator can generally be defi ned as a synthetic environment for imitating a real phenomenon. In the case of network prot ocols, the phenomenon is the behavior of real code as represented in the simulator a nd applied in different network scenarios. Tools like ns2, Scalable Simulation Framewor k (SSF) [19], GloMoSim [20], and MaRS [21] all provide simulation in va rying degrees of complexity. By contrast, live testbeds a llow testing to be done using real implementations of code. This involves the use of actual equipm ent and realization of networks to run the desired experiments. However, it can be very difficult to control certain aspects of a network when using live equipment for experi ments. Tools to control network variables as well as analyze protocol behavior in these testbeds are needed to make the experimentation process practic al. Tools that can be groupe d into this ca tegory include the Hitbox pseudo-device for SunOS [22], MOST Radio network emulator [23], Ohio Network Emulator [24], X-kernel [25], and NIST [26].

PAGE 18

10 When comparing simulation software to live testbeds there are several tradeoffs to consider. Simulators can be a relatively in expensive means of experimenting with large scale networks. Using a graphical user in terface or structured programming language, a user can create a network consisting of tens or hundreds of nodes and many complex data paths. For example, ns2 is a widely us ed simulator that can support experiments involving hundreds of nodes and connections (provided enough system memory is available). The ns2 code base is writte n in C++, with many users contributing new modules to add to its list of supported protocols. User s write scripts in the OTCL language to create the network topologies and control the inputs of the simulator. All of these features combine to provide a simu lator that is cheap, can run large scale experiments with a continuously growing list of protocols, and can be controlled by the user to perform tests that are easily and reproducible. There are drawbacks to using simulators that do not exist in live testbeds. Simulators require that code used in real operating systems and network devices be reproduced. This requires much effort on the programmers part as well as the need to make assumptions about the underlying hardware or operating system kernel that the real code would interact with. This may lead to differences in the simulators implementation of the code compared to the real-world implementation. It becomes the responsibility of the simulator programmer and/or the user to vali date this code to make sure it functions at an acceptable level of accuracy when experiments are run. Live testing eliminates the trouble of having to replicate an implementation of a network protocol as necessary in simulators. Experimenting with real code in a real environment allows for inclusion of factors th at may be taken for granted in simulators.

PAGE 19

11 For example, when studying the behavior or TCP in the Linux kern el there are hundreds of hardware and operating system related vari ables that might cause variations in the performance of the protocol. A lthough this increases the level of realism, it also presents new challenges. Scaling a network to a larger number of nodes requires adding more hardware, which can be costly and time c onsuming. Controlling network conditions is another problem. In the example of TCP, th e protocol adapts its behavior to network conditions by modifying variables like the congestion window, slow start threshold, and the retransmission timeout. Controlling the network conditions when using live equipment so that experimental results can be reproduced at an acceptable level of accuracy is a daunting task, even in relativel y small networks. There are several tools that attempt to provide the se rvice of controlling network variables for such experiments. These tool s attempt to emulate common effects such as bandwidth limitation, packet loss, buffer size, and packet delay among others. NIST, for example, is a network emulator that is inst alled as a Linux patch. It hooks in to the Linux kernel and can impose effects such as packet delay, jitter, packer reordering, packet loss (random and congestion-dependent), and bandwidth limitation. X-Kernel was developed as a new operating system that provided an explicit architecture for constructing and composing network protocols. When used with x-Sim [27] network emulation capabilities similar to NIST can be used to test these protocols under controlled conditions. The problem with using such tools in st udying transport layer protocols is that they lack the ability to reveal the masked complexities that are a big part of the protocols. Studying the values of variables such as the congestion window, slow start threshold, and

PAGE 20

12 retransmission timeout are important when experimenting with TCP. An effective testbed would therefore need to meet the following requirements: Provide the user with a high level of control over a wide range of network variables Provide transparency for all major mechanisms of the protocol under analysis Use existing and common implementations of protocols to ensure realistic results. Provide the user with the ability to inco rporate new protocols or make changes to existing ones Be cost effective enough for widespr ead use in education and research Web100 and Dummynet The success of TCP as the dominant prot ocol for data transfer over computer networks is largely due to its simplicity at the application level. The protocol is designed in such a way that the underlyi ng complexities are masked from the user applications that control it. Hiding these details has allowed applications to function under new and old networks, and has promoted the growth of the internet. However, many important things happen in the lower layers that should be exposed but are not. For example, pack et loss is hidden by TCPs retransmission machinery. If packet loss is due to a flaw ed network, the only symptom will be reduced performance. The inability to easily obser ve TCPs inner workings impairs the ability to conduct research in TCP behavior, test new TCP algorithms, educate future protocol researchers and detect bugs in TCP and lower layers [28].

PAGE 21

13 So the need arises for tools that will expose the inner workings of TCP. The Web100 project [1, 2] was created specifically to develop an advanced management interface for TCP. There are instrument s for capturing common events such as segments sent and received, as well as many more subtle events that cause TCP to reduce its transmission rate. The instruments are collectively referred to as the Kernel Instrument Set (KIS), which is th e core of the Web100 Project. The current implementation of Web100 is br oken in to two parts. The first part is a Linux kernel patch that implements the KIS to collect per-connection statistics on all major protocol events. Each connection has a separate KIS structure and instances that exchanges data directly with the netw ork protocol stack. Th e second part is the Userland, a set of utilities designed to allow easy readi ng and manipulation of the instruments exposed by the Web100 kernel patch. With the ability to observe the underlyi ng variables in TCP using Web100, the ability to control network traffic become s important for testing TCP under various network conditions. Dummynet is a flexible tool for flow management and for testing network protocols [3]. It has been around for several years in its FreeBSD implementation, but is easily portable to other protocol stacks. Dummynet works by intercepting packets as they pass through the protocol st ack and passing them through one or more pipes which can emulate the effects of bandwidth limitation, propagation delay, bounded-size queues, packet loss, and more [3]. By combining the ability to view the beha vior of TCP at its lowest levels and controlling the properties of the TCP flow, Web100 and Dumm ynet have the potential to produce an excellent research and test environment.

PAGE 22

14 Chapter 3 Setting up a Web100-Dummynet Testbed Testbed Topology and Specifications The chosen topology to run tests consists of one PC acting as a client, one PC acting as a router, and two PCs acting as servers. This dumbbell topology is the simplest way to test individual as well as competing data flows and control flow behavior. The client machine in this m odel is as user-end PC, performing normal network applications like FTP, web browsing, Telnet and SSH. The second PC functions as a router and makes use of Dummynets traffic shaping capabilities to control flow properties such as bandwidth, queuing, loss-rate, etc. The third and forth machines are set up as a servers. Thes e machines will have the Web100 patch and utilities installed on it and w ill be the focus of the study for collecting TCP statistics. Figure 3.1 illustrates the topology and the hardwa re specifications are listed in Table 3.1.

PAGE 23

Figure 3.1 Network Layout of Testbed Table 3.1 Testbed Hardware Specifications Web100 Server 1 Web100 Server 2 Dummynet Router Client PC CPU Intel Celeron 1.5 GHz Intel PIII 800MHz Intel PIII 800 MHz Intel PIII 933 MHz RAM 128 MB 128 MB 128 MB 256 MB Network Adapter(s) 3Com 3c905c 100 Mbps 3Com 3c905c 100 Mbps 3Com 3c905c 100 Mbps 3Com 3c905c 100 Mbps HDD Capacity 20 GB 30 GB 30 GB 30 GB Operating System RedHat 9 Kernel 2.4.26 RedHat 9 Kernel 2.4.26 FreeBSD 5.4 RedHat 9 Kernel 2.4.26 The FreeBSD Dummynet router is set up w ith three Ethernet cards (xl0, xl1 and xl2) and connected to the client and server machines. Crossover Ethernet cables must be used to link the network cards of the client and servers to cards on the Dummynet router. All network cards are configur ed for full duplex 100Mbps maximum data transfer rates. The IP addresses, subnet s, and default gateways for each network 15

PAGE 24

16 interface are also shown in figure 3.1. Info rmation on configuring network interfaces in Linux/Unix can be found using the man ifconfig command. The specifications of each machine do not reflect the minimum requirement for that machine to perform its functions at a cceptable levels. Each machine could have performed its operations just as well w ith less RAM or a slower CPU. The specifications in Table 3.1 reflect only what was used in these experiments and should not be interpreted as requirements for We b100 or Dummynet. Rather, the minimum requirements can be based on the options an d configuration of the operating system being used on each machine. Configuring a Web100 Client Web100 works by modifying files in a speci fic version of the Linux kernel in order to expose details of TCP that would ot herwise be hidden. At the time this testbed was being configured, the latest versi on of Web100 was built for the Linux 2.4.26 kernel. The software comes in two forms. The first is a patch that will modify the existing Linux kernel and make it a Web100-en abled kernel. The second is a precompiled kernel that the user installs. The benefit of using a precompiled kernel is that it gives the user the option at startup of se lecting the Web100 kernel or the standard Linux kernel that was originally installed. Installing Web100 as a precompiled kernel was the chosen option and this section is used to describe the procedure for doing so. For information on installing Web100 as a kernel patch, please see the install file that is included with the patch and can be downlo aded from [2]. In addition, all Web100 patches and kernels can be downloaded from [2].

PAGE 25

17 Because each Web100 release is designed for a particular version of the Linux kernel, selecting the appropriate Linux distribution is the first step in setting up a Web100 client machine. It was found that sele cting a distribution with a kernel revision number lower than the Web100 kernel being installed seemed to work the best. For instance, the Web100 kernel is version 2.4.26 so any Linux kernel version 2.4.X where X is less than 26 could be sele cted. A copy of RedHat 9 whic h met this criterion (kernel 2.4.20) was obtained. Installation of RedHat 9 is fairly simp le and straightforward. The software comes packaged with a graphical instal lation wizard that guides the user through selecting various options for th e type of installation they would like to perform. One important thing to note is that RedHat 9 allows the user to choose from various installation classes such as Personal Desktop, Workstation, or Server. Selecting one of these classes specifies which set of software packages and options are to be included during the installation. For the Web100 server machines it was found that the Server class is most appropriate. For more information on installing RedHat 9, please refer to [28]. Upon successful installation of the base Linux distribution, the next step is to proceed with the Web100 kernel installa tion. The precompiled Web100 kernels come packaged as an RPM file and installing it is very quick and simple. The commands for installing the kernel can be found in th e README.txt file available with each download. Below is an example of the comman ds used to install the kernel using the RPM file. These commands should be execute d (as root) from the same directory where the RPM file is located:

PAGE 26

18 #rpm -ivh kernel-w eb100-2.4.26-2.3.8.i686.rpm After the kernel has been installed, the machine should be rebooted. An option will be available at boot time to select the appropriate kernel to load. Select the Web100 kernel version (in this case 2.4.26) to co mplete the kernel installation process. In addition to the Web100 kernel, a shar ed library with a set of utilities is available to provide user-f riendly tools for viewing and manipulating TCP flow variables. This library is known as the Web100 Userland and is available for download from [4]. It was found, however, that these tools were unable to serve the requirements of the tests that needed to be performed. In order to accurately collect statistics during the entire lifetime of a TCP connection, a tool was required that would continuously monitor the Web100 client machine for ne w connections and begin logging Web100 variables for the lifespan of the flow. Th e Web100 Userland currently has no such tool.

PAGE 27

Figure 3.2 Screenshot of the Web100 Userland Fortunately, the need for such a tool was not overlooked by others working in the Web100 community. The Web100 project at ORNL [29] has produced several useful tools, including a tr ace daemon designed to log us er specified web100 variables at a set interval. The trace daemon uses a c onfiguration file (trace d.conf) that contains the hosts to be monitored as well as the va riables to be logged. When a connection to that host is opened, the daemon begins logging the information to a trace (stats) file. The stats file can be analyzed and used to generate charts and graphs. This is very useful for providing visual re presentations of important TCP events and statistics. 19

PAGE 28

Figure 3.3 Trace Daemon monitoring an FTP connection Figure 3.4 Excerpt from Trace Daemon conf file 20

PAGE 29

21 Configuring a Dummynet Router Most common methodologies to test network protocols are simulation and experimental testbeds. While simulation t ools provide more easy and cost-effective control over the various operational parameters it is really difficult to change channel capacities, packet loss rates, de lays, traffic loads, etc. in experimental testbeds. On the other hand, simulation tools and models make many simplifying assumptions that cannot mimic real scenarios and networks [3]. In this section, it is shown how Dumm ynet can be used to manipulate these network path variables while being part of a real network. Dummynet works by intercepting the packets of th e protocol layer under test and simulating the effects of finite queues, bandwidth limitations, lossy links and communication delays. It runs in a fully operational system, hence allowing the use of real tra ffic generators and protocol implementations while solving the problem of simulating unusual environments. With this tool, doing experiments with network prot ocols is as simple as running the desired set of applications that are run on a daily basis on a workstation [3]. A FreeBSD implementation of Dummynet is available in all recent FreeBSD releases. The implementation is highly portable in other BS D systems and takes less than 300 lines of kernel code [3]. This sec tion discusses how to setup and configure a FreeBSD system to include Dummynet and brie fly cover the basics for using the tool to shape network traffic. It is assumed that th e machine that Dummynet is being configured on already has a working copy of the latest Free BSD release installed on it. If this is not the case, a copy of the operating system and in stallation guide can be obtained from [30].

PAGE 30

22 The first step to setting up Dummynet is to configure the FreeBSD kernel options. Editing the kernel options is done creating and editing a custom kernel configuration file. On Intel-based systems, the existing c onfiguration file can be found in the /usr/src/sys/i386/con f directory and has the file name GENERIC. This file represents the current kernel configuration and options in cluded on your FreeBSD system and generally should not be modified directly by the user. A safer approach is to create a copy of this file to edit as your custom kernel configur ation file. An example for doing so is as follows (assuming the user is wo rking with root privileges): # cd /usr/src/sys/i386/conf # cp GENERIC MYKERNEL This creates a custom kernel configura tion file named MYKERNEL. It can now be edited to include the options requir ed to run Dummynet on the system. Adding options is as simple as including the following lines in your configuration file: options IPFIREWALL options DUMMYNET options HZ=1000 Dummynet works by intercepting packets determined by the rules set in the FreeBSD firewall. Because of this, the I PFIREWALL option must be enabled in the kernel. The HZ option is not required by Dummynet, but strongly recommended because it increases the frequency of the timer that controls how long pack ets are held in the system. Setting this value to 1000 increases the accuracy to 1ms and allows for smoother traffic flow. With these options added to the ke rnel configuration file, the only thing left

PAGE 31

23 to do is compile the new kernel and reboot to allow it to load in the system. The commands for compiling the kernel are as follows: # cd /usr/src # make buildkernel KERNCONF=MYKERNEL # make installkernel KERNCONF=MYKERNEL More information on configuring and compiling a FreeBSD kernel can be found in the FreeBSD Handbook [31]. In addition to kernel options, there are other FreeBSD system settings that should be enabled for Dummynet to function co rrectly. All of these settings can be controlled using the sysctl command. The settings and their values are as follows: net.inet.ip.fw.enable: 1 net.inet.ip.fw.enable: 1 net.inet.ip.fw.one pass: 1 net.inet.ip.fw.dyn buckets: 256 net.inet.ip.fw.curr dyn buckets: 256 net.inet.ip.fw.dyn count: 3 net.inet.ip.fw.dyn max: 1000 net.link.ether.bridge cf g: iface(1), ...iface(N) net.link.ether.bridge: 1 net.link.ether.bridge ipfw: 1 These settings serve as defaults for th e systems firewall and Dummynet. This example has the interfaces configured with bridging enabled. More information on setting the global configuration for the firewall, Du mmynet, and bridging can be found in [31].

PAGE 32

24 Traffic shaping in Dummynet is performed through the creation and configuration of pipes. Pipes are the basic elements of the tr affic shaper and are used to emulate network links with a certain bandwidth, delay and p acket loss rate. Queues are implemented for weighted fair queuing and must be used with existing pipes [32]. The creation and configuration of pipes and queues are completely controlled by the ipfw command. The example below illustrates how to use this command to create and configure pipes: # ipfw pipe 1 config bw 50Kbit/s queue 10 delay 50ms # ipfw pipe 2 config bw 50K bit/s queue 10 delay 50ms plr .001 # ipfw add pipe 1 udp from 192.168.0.1 to any # ipfw add pipe 2 ip from any to any In this example, two separate pipes were created. The first pipe limits traffic to UDP packets with a source IP of 192.168.0.1 and any destination IP. The second pipe allows any IP packet to pass through the fire wall. If no pipes are generated, the firewall will not send or receive any packets. Otherw ise, each packet is compared against the pipes in sequential order until a match is f ound. If the packets do not match any of the pipes, they are dropped. The pi pes are configured using the config keyword. Pipe 1 in the example is limited to a maximum bandwidth of 50Kbit/s, queue size of 10 slots (packets), and delay of 50ms. Pipe 2 has the same confi guration of pipe 1, but is also set with a packet loss rate of .001 to simulate a lossy link [32]. Dummynet also includes mechanisms for implementing weighted fair queuing allowing for sharing of a pipes bandwidth at a certain configurab le proportion. These queues must be attached to an existing pi pe and cannot function independently. The

PAGE 33

25 following commands illustrate how to set up queues to share a 60Mbit/s pipe at a 40 to 20 ratio. # ipfw config pipe 2 bw 60Mbit/s # ipfw queue 2 config pipe 2 weight 40 # ipfw queue 3 config pipe 2 weight 20 # ipfw add 100 queue 2 ip from 192.168.1.2 to any out xmit xl0 # ipfw add 101 queue 3 ip from 192.168.2.2 to any out xmit xl0 The rules are configured so that any traffic with source IP addresses of 192.168.1.2 or 192.168.2.2 that is transmitted out interface xl0 is passed through the appropriate weighted queue. Traffic from 192.168.1.2 is allotted 40Mbit/s while traffic from 192.168.2.2 is given 20Mbit/s. With the ability to control path vari ables with Dummynet and study traffic behavior with Web100, a suitable testing enviro nment for TCP is available for education and research. The next chapter explores the potential of the testbe d and provides results from many of the tests that were performed.

PAGE 34

26 Chapter 4 Education and Research in Transport Layer Protocols (TLP) Educational Experiments and Results Testing of the Web100/Dummynet testbed shows promising results. A 500MB test file was generated to be used in the file transfer from the Web100 servers to the client. Disk I/O should not be an issue in this case, since typical inte rnal transfer speeds of the hard disks is in the range of severa l hundred Mbps. All tests were run using the topology shown in Figure 3.1. The trace daemon was initialized to monitor the connection from the servers and record the desire d statistics to an output file. The charts in this section show typical plots with the performance of the conn ection and the behavior of important TCP variables. Analysis of a Single TCP Flow under Various Network Conditions Web100 provides the ability to monitor numerous variables of an active TCP connection. In this first analysis, the goa l is to understand how a single TCP flow behaves when certain network conditions ar e modified. Using Dummynet, changes are made to the bottleneck link speed, packet lo ss rate, and buffer size. The throughput is studied to demonstrate the effect of these conditions on TCP. The final experiment in this section involves simulating a typical cable modem connection when trying to transfer a file across an overseas link. Th e server is able to send data at 5Mbps while the client is

PAGE 35

27 limited to 384Kbps. Additional variables in cluding the congestion window, slow-start threshold, round-trip time, and retransmission timeout are also include d in the charts for this experiment. Different Buffer Sizes By modifying the size of the packet buffe r in the Dummynet router, the effects on the throughput are shown in Figures 4.1 4.3. The experiments are run for 50 seconds and a bandwidth limitation of 15Mbps is appl ied. Dummynet supports a buffer of up to 100 packets, which is normally more than suffi cient for single data flows. Each packet sent is 1448 bytes in size. When using a 5 p acket buffer, it can be seen in Figure 4.1 that the throughput ranges between approximatel y 7 and 10 Mbps. Figure 4.2 shows that increasing the buffer to 20 packets allows for throughput close to the maximum 15 Mbps; however a much smoother connection is established when the 75 packet buffer is used in Figure 4.3.

PAGE 36

0 2 4 6 8 10 12 14 0 5 10 15 20 25 30 35 40 45 50 Throughput (Mbits) Time (Sec) 15 Mbps, 5 packet buffer Throughput Figure 4.1 15Mbps with 5 packet buffer 0 2 4 6 8 10 12 14 0 5 10 15 20 25 30 35 40 45 50 Throughput (Mbits) Time (Sec) 15 Mbps, 20 packet buffer Throughput Figure 4.2 15Mbps with 20 packet buffer 0 2 4 6 8 10 12 14 0 5 10 15 20 25 30 35 40 45 50 Throughput (Mbits) Time (Sec) 15 Mbps, 75 packet buffer Throughput Figure 4.3 15Mbps with 75 packet buffer 28

PAGE 37

29 Different Bottleneck Link Speeds Studying the throughput at different bottleneck link speeds shows some interesting results. Figure 4.4 plots the in stantaneous throughput of a 50 second file transfer when a 15Mbps bandwidth limit is app lied. No other constrai nts were applied in the Dummynet router, however it should be noted that by default Dummynet uses a buffer size of 50 packets. The spikes in the graph indicate where ti meouts are occurring. In Figure 4.5 the bottleneck link is increased to 60Mbps. It is clear from this graph that the number of timeouts increases as noted by the frequent spikes.

PAGE 38

0 2 4 6 8 10 12 14 0 5 10 15 20 25 30 35 40 45 50 Throughput (Mbits) Time (Sec) 15 Mbps Throughput Figure 4.4 15Mbps bottleneck 0 10 20 30 40 50 60 0 5 10 15 20 25 30 35 40 45 50 Throughput (Mbits) Time (Sec) 60 Mbps Throughput Figure 4.5 60Mbps bottleneck 30

PAGE 39

Different Packet Loss Ratios The ability to control the packet lo ss rate (PLR) in Dummynet allows for demonstrating how the throughput of a TCP connec tion is affected at different levels of packet loss. Dummynet uses a random distribution when dropping packets. Each packet has an equal probability when being consid ered by Dummynet. In Figure 4.6, a 15Mbps connection is tested using a pack et loss probability of .005. Even this small rate of packet loss causes a noticeable drop in the throughput as it is only able to av erage slightly more than 10Mbps. In Figures 4.7 the PLR is doubled to .01 and the throughput is seen to average approximately 8Mbps. By increasi ng the packet loss rate by to .10, the throughput drops substantially a nd averages less than 2Mbps. 0 2 4 6 8 10 12 14 0 5 10 15 20 25 30 35 40 45 50 Throughput (Mbits) Time (Sec) 15 Mbps, PLR .005 Throughput 0 2 4 6 8 10 12 14 0 5 10 15 20 25 30 35 40 45 50 Average Throughput (Mbits) Time (Sec) 15 Mbps, PLR .005 Avg. Throughput Figure 4.6 Instantaneous and Average Throughput at 15Mbps, PLR .005 0 2 4 6 8 10 12 14 0 5 10 15 20 25 30 35 40 45 50 Throughput (Mbits) Time (Sec) 15 Mbps, PLR .01 Throughput 0 2 4 6 8 10 12 14 0 5 10 15 20 25 30 35 40 45 50 Average Throughput (Mbits) Time (Sec) 15 Mbps, PLR .01 Avg. Throughput Figure 4.7 Instantaneous and Average Throughput at 15Mbps, PLR .01 31

PAGE 40

0 2 4 6 8 10 12 14 0 5 10 15 20 25 30 35 40 45 50 Throughput (Mbits) Time (Sec) 15 Mbps, PLR .10 Throughput 0 2 4 6 8 10 12 14 0 5 10 15 20 25 30 35 40 45 50 Average Throughput (Mbits) Time (Sec) 15 Mbps, PLR .10 Avg. Throughput Figure 4.8 Instantaneous and Average Throughput at 15Mbps, PLR .10 Broadband Connection across an Overseas Link The purpose of the experiments in this section is to demonstrate how a Web100Dummynet testbed can be used to emulate a typical file transf er scenario that users may encounter when transferring files over th e Internet. Additionally, these results demonstrate the ability to study TCP mechan isms such as the congestion window, slow start threshold, RTO and packet roundtrip time under these conditions. To accomplish this Dummynet is configured with different bandwidth limitations for data originating from the server and the acknowledgements from the client. A typical cable modem connection may provide downstr eam data rates of up to 5Mbps upstream rate of up to 384Kbps for an end user. If th is link is assumed to be the bottleneck, then from the perspective of the server the maximu m upstream rate is 5Mbps and the rate of acknowledgements coming downstream from the client is at most 384Kbps. In many cases file transfers may occur across an overs eas link, resulting in la rge packet round trip times. To create this condition, a delay of 120ms is applied to all outbound and inbound traffic. 32

PAGE 41

33 Figure 4.9 illustrates that the throughput of the connection behaves as expected for TCP. There is a sudden increase in the instantaneous throughput at the beginning of the connection, which can be attributed to th e slow start phase a nd exponential increase of TCPs congestion window. At approximate ly 30 and 75 seconds, packet loss occurs and the throughput gradually c limbs back to 5Mbps. The behavior of the congestion window and slow start threshold in Figure 4.11 reinforces this notion. The average throughput as shown in Figure 4.10 demonstrates that the connection comes very close to averaging 5Mbps, as our network conditions dictate. Finally, in Figure 4.12 the pl ots of the round-trip time and the RTO indicate that the applied delay in Dummynet is working as it should. The TCP sender estimates the packet round-trip time and uses this value to determine the value of the RTO. The RTO is generally calculated by adding some c onstant value to the estimated RTT and multiplying it by the deviation of the samp le RTT from the estimated RTT [33].

PAGE 42

0 2 4 6 8 10 0 20 40 60 80 100 Throughput (Mbits) Time (Sec) 5 Mbps Upstream, 384 Kbps Downstream, 240 sec. RTT Throughput Figure 4.9 Throughput of 5Mbps Upstream, 384Kbps Downstream, 240ms RTT 0 2 4 6 8 10 0 20 40 60 80 100 Average Throughput (Mbits) Time (Sec) 5 Mbps Upstream, 384 Kbps Downstream, 240 sec. RTT Avg. Throughput Figure 4.10 Avg. Throughput of 5Mbps Upstream, 384Kbps Downstream, 240ms RTT 34

PAGE 43

0 50000 100000 150000 200000 250000 300000 350000 0 20 40 60 80 100 Bytes Time (Sec) 5 Mbps Upstream, 384 Kbps Downstream, 240 sec. RTT Cwnd Ssthresh Max Cwnd Figure 4.11 Congestion window of 5Mbps upstream, 384Kbps downstream, 240ms RTT 200 250 300 350 400 450 500 550 600 650 700 750 0 20 40 60 80 100 Value Time (Sec) 5 Mbps Upstream, 384 Kbps Downstream, 240 sec. RTT Smoothed RTT RTO Figure 4.12 RTT and RTO of 5Mbps upstream, 384Kbps downstream, 240ms RTT 35

PAGE 44

36 Research Experiments and Results The SF-SACK Protocol SF-SACK is a SACK based protocol that provides smooth enough transfer rates for streaming applications while implemen ting the known flow and congestion control mechanisms of TCP [4, 5]. W ith the increasing amount of real-time applications being used over the internet, the UDP protocol is becoming more preval ent. UDP, although more suitable for streaming, r eal-time multimedia is also a very unfriendly protocol. It uses whatever bandwidth is available without consideration for the state of congestion in the network. SF-SACK has been shown through simulatio n to provide smooth enough transfer rates for streaming applications while implementing the congestion control mechanisms of TCP. It has also been show n to be considerably fairer than UDP when competing with TCP [5]. The SF-SACK algorithm requires only sender side modification but can be challenging to those who are unfam iliar with programming in the Linux kernel. How to incorporate a new protocol in to Web100: The SF-SACK case The goal of this section is to disc uss how the SF-SACK protocol can be implemented and deployed in the Linux kern el. The steps to accomplishing this task include obtaining the source code, modifying the appropria te functions, and packaging the kernel in a manner that makes it easy to depl oy in other systems. The source code of the implementation is available in Appendix A and the number of lines of code modified or added is approximately 200.

PAGE 45

37 To begin the process of implementing SFSACK in the Linux kernel, it is first necessary to obtain the kernel that will be modified. Web100 makes available on its website source kernels for those who wish to bu ild or modify the code [2]. The kernel that was chosen for the work in this th esis is the Web100 2.3.8, which is based on Linux 2.4.26. The source code is packaged as an RPM file so that it can be deployed easily in the existing RedHat 9 machine. RPMs are disc ussed in detail a little later. The source RPM can be installed by using the command: rpm -ivh kernel-web100-{package name}.rpm This will install the source code in to the default Redhat 9 directory /usr/src/linux-{kernel version}/ This will be the directory from which we will reference all kernel files in this document. The files in the kernel that will need to be modified are: /net/netsyms.c /include/net/sock.h /include/net/tcp.h /include/linux/sysctl.h /net/ipv4/sysctl_net_ipv4.c /net/ipv4/tcp.c /net/ipv4/tcp_input.c /net/ipv4/tcp_ipv4.c /net/ipv4/tcp_timer.c

PAGE 46

38 Before beginning the implement the algorithm in the code, certain structures and variables must be defined. In the current Linux kernel, the tcp_opt structure is defined in the file sock.h. This structure contains all variables of the TC P algorithms for a TCP connection. A structure to keep track of a ll of the SF-SACK variables as well as the timer structure which will be used to reca lculate the congestion window at specified time interval must be added. The definition of these structures can be viewed in the source code of sock.h in Appendix A. The next set of variables to be defined are the sysctl variables for SFSACK. Sysctl variables are used to modify kernel pa rameters at runtime. Two variables will be defined, one to turn the protocol on or off, a nd the other to modify the timer interval that the protocol uses to upda te the congestion window. These variables are named sysctl_tcp_sfsack and sysctl_tcp_sfsack_tau respectively. Adding th ese variables to the kernel requires modification of multiple files. The source code for these changes is available in the files tcp.c, sysctl.h, netsyms.c, and sysctl_net_ipv4.c in Appendix A. The result of defining these sysctl variab les is that the protocol can easily be controlled at runtime. For instance, if SF-SACK should be turned off and TCP be allowed to behave in its traditional manne r, the following command would be executed: #sysctl w net.ipv4.tcp_sfsack=0 Conversely this variable woul d be set to 1 if SF-SACK should be turned on. If the interval in which the algorithm updates the value of the congestion window needs to be adjusted, the following command can be used: #sysctl w net.ipv4.tcp_sfsack_tau=50

PAGE 47

39 This command determines when the timer variable sfsack_cwnd_timer expires and executes the code to calcu late the congestion window. The default is set to 100ms and in the example its value is changed to 50ms. SF-SACKs effectiveness relies on when and how the congestion window is changed. There are two events that trigge r the congestion window to be recalculated. The first is a timer, which continuously updates the value of the congestion window based on previous values in the flows hi story. The second is whenever a loss event occurs. Since this is such a significant part of the SF-SACK protocol the first thing that should be done is define the function that will recalculate the congestion window. The function is defined as tcp_sfsack_cwnd in the file tcp.h. After the function to recalculate the congestion window is defined, implementation of the required timers can proc eed. The timer variable is defined as sfsack_cwnd_timer in the file sock.h. This variable must be init ialized and the function that is executed each time it ex pires needs to be defined. The file that needs to be modified is tcp_timer.c. The function tc p_init_xmit_timers() should be modified and adding a new function tcp_sfsack_cwnd_timer() is required. The code for both of these functions is available in Appendix A. Other significant changes in the Linux kernel that are made involve the way TCP handles packet loss events. The two types of loss events occur when the sender receives three duplicate acknowledgements (dupacks) or when a timeout occurs. The timeout event of TCP is handled by the function tcp_enter_loss() in tcp_input.c. The function is called whenever the retransmit timer has expired and serves (among other things) to set the congestion wi ndow to 1. SF-SACK requires that rather

PAGE 48

40 than reduce the congestion window to 1, we use some previous history to recalculate it. In order to accomplish this the function tcp_sfsac k_timeout() is defined in the file tcp.h. Next, the process of how TCP handles loss events due to 3 dupacks is examined. The new function to handle this event is tcp_sfsack_complete_cwr() and is defined in the file tcp.h. In the Linux kernel, whenev er the congestion window has to be reduced due to packet loss a function called tcp_complete_cwr() is called. This function is located in the file tc p_input.c and it is modified so that if SF-SACK is running, the new function tcp_sfsack_complete_cwr() is us ed to recalculate the congestion window instead. With the code in place to handle the r ecalculation of the congestion window by the timer and at the two loss events, th e implementation of SF-SACK in the Linux kernel is complete. The next step is to discuss how the new kernel can be compiled and packaged in a manner that makes it easy to install on other machines. The RPM Package Manager (RPM is a recursive acronym) is an open packaging system, available for anyone to use, whic h runs on Red Hat Linux as well as other Linux and UNIX systems [34]. RPM maintains a databa se of installed packages and their files which allows for queries and system verification for all packages installed by the tool. It makes updating the system simple because it only requires short commands to install, uninstall, and upgrade RPM packages [34]. The Web100 Linux kernel that is used to implement SF-SACK in comes packaged as an RPM from the web100 webs ite [2]. There are two types of RPM packages, binary and source. Binary packages install all of the linked object files and libraries, while source packages are used to di stribute the raw source code. The files that

PAGE 49

41 were installed and modified for the SFSACK implementation came from the web100 source RPM. The next step is to create a bi nary RPM that can be used to install the new kernel on other machines. The process starts by first moving to th e directory where the source files are located. In this case it is the directory /usr/src/linux-2.4.26-2.3.8/. The source tree, which is required by RPM to compile the package, must now be prepared. #make mrproper With the source tree prepared, the next step is to select the configuration options for the kernel. This is just a way to sp ecify which drivers and functions should be included. It is usually easiest to start with one of the sta ndard RedHat config files and then customize it to suit indi vidual needs. To do this, a copy an existing configuration file is placed at the root of the kernel source directory. The make command is used to configure the kernel. #cp configs/kernel-2.4.262.3.8-i686.config .config #make oldconfig The second command in this sequence is us ed to configure the new kernel and skip options that are known to work. It will only stop the configuration for new or modified options that are not c onfigured in the existing system This helps to save time because the user does not have to start with an empty kernel configuration and select every option that should be included. After the kernel configuration is comple te, some additional customization can be done by using the following command:

PAGE 50

42 #make xconfig Or #make menuconfig These commands launch a graphical interface for adding or removing options from the existing kernel c onfiguration produced by the make oldconfig command. For instance, if there is a set of dr ivers that will probably not be n eeded in this kernel, the user can easily remove them using the interf ace provided by these commands. Removing unnecessary options can prove very useful in reducing compile time of the kernel. After the configuration has been cust omized, building of the actual RPM file can proceed. The command to do this is very simple: #make rpm This begins the process of indexing all of the required files for the package and then compiling the code. It should be noted th at because of customizations that may have been done to the kernel configuration, th e RPM build process may pause whenever it encounters a new or changed option. The user w ill then need to decide if they would like to include this option or not. If there are any problems during compilation, the build process will exit with an error and the user will have to correct the problem and execute the make rpm command again. If all goes well and the process completes successfully, the resulting RPM package can be retrieved from the director y /usr/src/redhat/RPMS This RPM can now be copied to other machines and installed by using the command: #rpm ivh {package name}.rpm

PAGE 51

43 With the kernel installed on a new system, the final step is to add the appropriate lines to the grub.conf file so that the new ke rnel can be selected when the machine is restarted. The grub.conf file can be located in the directory /boot/grub and should be modified as described in [35]. Comparison of TCP SACK and TCP SF-SACK Fairness and Smoothness of Competing Flows In this section the results from seve ral experiments comparing the SACK and SF-SACK protocols are shown. Figures 4.13 and 4.14 show results from experiments run in the ns2 simulator. Figure 4.13 illust rates two TCP SACK connections competing for a 15Mbps bottleneck link. The throughpu t and congestion window values over a 300 second period are plotted. Fi gure 4.14 repeats this experi ment with SF-SACK. By comparing the results, it can be seen that SF-SACK provides for less variation of the congestion window, resulting in sm oother throughput than SACK. In Figures 4.15 and 4.16, experiments are run in the Web100-Dummynet testbed using the same bottleneck link speed and round-trip time. By comparing these results to the plots from the ns2 simulations, it is shown that the testbed implementation of SFSACK produces similar results. The congest ion window variance is less than that of SACK and the throughput is also smoother. Figure 4.17 illustrates the behavior of SFSACK and SACK when they are competing against each other. In this case, the noticeable difference is that SF-SACK obtains a larger percentage of the bottleneck link

PAGE 52

44 bandwidth than SACK. This behavior is e xpected due to the smoot her decrease in the congestion window inherent to SF-SACK. A closer comparison of the congestion window as shown in Figures 4.18 4.20 reinforces this notion. Table 4.1 provides the mean and standard deviation of each of the flows. The standard deviation of the instantaneous th roughput is less in the SF-SACK vs. SFSACK experiment than it is in the SACK vs. SACK experiment. However the mean throughput of the competing flows remains relativ ely fair in both cases. In the case of SACK competing against SF-SACK, the mean throughput of SF-SACK is more than double that of SACK. This data indicates that while SF-SACK is friendly to itself in the same way SACK is, it is not necessarily frie ndly when competing against SACK. It is still much friendlier than UDP would be in a similar experime nt [5]. In either case, it provides a smoother rate of data transfer.

PAGE 53

Figure 4.13 SACK vs. SACK15Mbps, 20ms RTT ns2 Simulation Figure 4.14 SF-SACK vs. SF-SACK 15Mbps, 20ms RTT ns2 Simulation 45

PAGE 54

0 2 4 6 8 10 12 14 0 50 100 150 200 250 300 Throughput (Mbits) Time (Sec) 15Mbps, 20ms RTT SACK1 SACK2 0 20000 40000 60000 80000 100000 120000 0 50 100 150 200 250 300 Congestion Window (Bytes) Time (Sec) 15Mbps, 20ms RTT SACK1 SACK2 Figure 4.15 SACK vs. SACK 15Mbps, 20ms RTT 0 2 4 6 8 10 12 14 0 50 100 150 200 250 300 Throughput (Mbits) Time (Sec) 15Mbps, 20ms RTT SF-SACK1 SF-SACK2 0 20000 40000 60000 80000 100000 120000 0 50 100 150 200 250 300 Congestion Window (Bytes) Time (Sec) 15Mbps, 20ms RTT SF-SACK1 SF-SACK2 Figure 4.16 SF-SACK vs. SF-SACK 15Mbps, 20ms RTT 0 2 4 6 8 10 12 14 0 50 100 150 200 250 300 Throughput (Mbits) Time (Sec) 15Mbps, 20ms RTT SACK SF-SACK 0 20000 40000 60000 80000 100000 120000 0 50 100 150 200 250 300 Congestion Window (Bytes) Time (Sec) 15Mbps, 20ms RTT SACK SF-SACK Figure 4.17 SACK vs. SF-SACK 15Mbps, 20ms RTT 46

PAGE 55

0 20000 40000 60000 80000 100000 120000 0 5 10 15 20 25 30 35 40 Congestion Window (Bytes) Time (Sec) 15Mbps, 20ms RTT SACK1 SACK2 Figure 4.18 Congestion Window of SACK vs. SACK 0 20000 40000 60000 80000 100000 120000 0 5 10 15 20 25 30 35 40 Congestion Window (Bytes) Time (Sec) 15Mbps, Delay 20ms SACK SF-SACK Figure 4.19 Congestion Window of SACK vs. SF-SACK 0 20000 40000 60000 80000 100000 120000 0 5 10 15 20 25 30 35 40 Congestion Window (Bytes) Time (Sec) 15Mbps, Delay 20ms SF-SACK1 SF-SACK2 Figure 4.20 Congestion Window of SF-SACK vs. SF-SACK 47

PAGE 56

48 Table 4.1 Mean Throughput and Standard Deviation of Competing SACK and SF-SACK Connections 15Mbps Bottleneck, 20ms RTT Protocols (Server1 / Server2) Mean (Mbps) (Server1 / Server2) Standard Deviation (Server1 / Server2) SACK / SACK 7.48 / 7.23 2.14 / 2.50 SACK / SF-SACK 4.56 / 9.89 1.89 / 1.59 SF-SACK / SF-SACK 7.72 / 6.75 1.63 / 1.64 Different Queuing Strategies The experiments performed in this section are meant to illustrate the affects of buffer management schemes and weighted queues on competing SF-SACK flows. Dummynet features the DropTa il and Random Early Detection (RED) queuing strategies. These experiments use a buffer size of 30 packets and the following RED parameters: w_q = .002 min_th = 5 max_th = 15 max_p = .10 By plotting the average throughput, the a ffects of the queui ng strategy on the fairness of the connections can be seen. Bo th queuing strategies maintain a reasonable level of fairness between the competing flow s. After approximately 50 seconds, neither of the SF-SACK connections in Figures 4.21 and 4.22 deviate above 8 Mbps or below 6 Mbps. However, in Figure 4.20 the effect of the RED queuing strategy is illustrated by the more rapid convergence of the flows at approximately 150 seconds, compared to approximately 200 seconds for the DropTail queuing strategy.

PAGE 57

0 2 4 6 8 10 12 14 0 50 100 150 200 250 300 Avg. Throughput (Mbits) Time (Sec) 15Mbps, 20ms RTT, DropTail Queue SF-SACK1 SF-SACK2 Figure 4.21 SF-SACK vs. SF-SACK, DropTail Queue 0 2 4 6 8 10 12 14 0 50 100 150 200 250 300 Avg. Throughput (Mbits) Time (Sec) 15Mbps, 20ms RTT, RED Queue SF-SACK1 SF-SACK2 Figure 4.22 SF-SACK vs. SF-SACK, RED Queue 49

PAGE 58

Weighted Bandwidth Queues By configuring weighted fair queuing (WFQ) in Dummynet, the experiments in this section illustrate the behavior of SF-SACK when the two flows are allocated different ratios of the available bandwidth. In this case, Dummyne t is configured to weight the bandwidth of a 60 Mbps pipe in a 2 to 1 ratio. Figures 4.23 and 4.24 show the instantaneous throughput and the average th roughput, respectively. The first flow averages very close to 40 Mbps while th e second flow averages about 20 Mbps. The results indicate that the pr otocol behaves as expected when the WFQ constraints are applied in the testbed. 0 10 20 30 40 50 60 0 20 40 60 80 100 Throughput (Mbits) Time (Sec) 60Mbps, 40:20 Weighted Queue SF-SACK1 SF-SACK2 Figure 4.23 Throughput of SF-SACK vs. SF-SACK, 40:20 Weighted Queue 50

PAGE 59

60Mbps, 40:20 Weighted Queue 51 0 10 20 30 40 50 60 0 20 40 60 80 100 SF-SACK1 SF-SACK2 Avg. Throughput (Mbits) Time (Sec) Figure 4.24 Avg. Throughput of SF-SACK vs. SF-SACK, 40:20 Weighted Queue

PAGE 60

52 Chapter 5 Conclusions This thesis describes the network topo logy, architecture, hardware and software specifications needed to implement and use a Web100-Dummynet testbed for research and education in TCP. By modifying ne twork conditions including round-trip time, bottleneck bandwidth, packet loss rate and bu ffer size, it is shown that educators can study various aspects of TCP under varying cond itions. Studying variables such as the congestion window, slow-start threshold and throughput in a live network provides a new dimension when trying to understand how the protocol works. The implementation of the SF-SACK protoc ol in the Linux kernel illustrates the benefits of using this testbed for research purposes. Often times these new protocols are tested in simulation software and assumed to behave accordingly in live networks. By using the tools described in this thesis, it is shown that researchers can perform a number of tests on new transport layer protocol to ensure correct behavior under certain network conditions.

PAGE 61

53 References [1] M. Mathis, J Heffner and R Reddy, "Web100: Extended TCP Instrumentation for Research, Education and Diagnosis", ACM Computer Communications Review, Vol 33, Num 3, July 2003. [2] The Web100 Project, URL: http://www.web100.org [3] L.Rizzo, "Dummynet: a simple approach to the evaluation of network protocols," ACM Computer Communication Review, v. 27, n. 1, January 1997. [4] S. Bakthavachalu, M. A. Labrador, SF-SACK: A Smooth Friendly TCP SACKbased Protocol, Department of Computer Science and Engineering, University of South Florida, 2004. [5] S. Bakthavachalu, S. Bassi, X. Jianxuan, M. A. Labrador, An Additive Increase Smooth Decrease (AISD) Strategy for Da ta and Streaming Applications, Submitted to IEEE/INFOCOM 2006, Barcelona, 2006. [6] Mathis, M., Mahdavi, J., Floyd, S., and Romanow, A., TCP Selective Acknowledgement Options. RFC 2018, April 1996. [7] C. Casetti, M. Gerla, S. Mascolo, M. Y. Sanadidi, and R. Wang, TCP Westwood: bandwidth estimation for enhanced trans port over wireless links, In Proceedings of Mobicom 2001, Rome, Italy, Jul. 2001. [8] Kenshin Yamada, Ren Wang, M.Y. Sanadi di, and Mario Gerla, TCP Westwood with Agile probing: Dealing with Dynamic, Large, Leaky Pipes IEEE/ICC' 2003 proceedings, Anchorage, AK, USA, May, 2003. [9] NS-2 Network Simulator: http://www.isi.edu/nsnam/ns version 2.28, Jan 2004. [10] Ren Wang, Giovanni Pau, Kenshin Yamada, M.Y. Sanadidi, and Mario Gerla, TCP Startup Performance in La rge Bandwidth delay networks : IEEE/INFOCOM 2004 proceedings, Hong Kong, 2004. [11] L. Brakmo, and L. Peterson, TCP Vegas: End to end Congestion Avoidance in a Global Internet: IEEE Journal on Select ed Areas in Communications, Vol 13, No. 8, 1465-1480, 1995.

PAGE 62

54 [12] Mario Gerla, M.Y Sanadidi, Ren Wang, and Massimo Valla, TCP Westwood: Efficient Transport for Highspeed Wi red/Wireless Networks : Workshop on High-Speed Networking 2002,in conjunc tion with IEEE/INFOCOM 2002, New York,USA, Jun. 2002. [13] S. Floyd, HighSpeed TCP for Large Conge stion Windows, Intern et drafts draftietf-tsvwg-highspeed-01.txt, work in progress, August 2003. [14] Xu jianzuan, Subdoh Kerkar, Miguel A. La brador, Performance Evaluation of TCP over Optical Links Department of Computer Science & Engineering, University of South Florida {xijanxua, subodh, Labrador}@csee.usf.edu [15] N.S. (ns2), http://www.isi.edu/nsnam/ns [16] GNU General Public License, http://www.gnu.org/licenses/gpl.html [17] FreeBSD Wikipedia, the Free Encyclopedia, http://en.wikipedia.org/wiki/Freebsd [18] The 4.4BSD Copyright, http://www.freebsd.org/copyright/license.html [19] Scalable Simulation Framework, http://www.ssfnet.org [20] X. Zeng, R. Bagrodia, M. Gerla, GloMo Sim: A Library for Parallel Simulation of Large-scale Wireless Networks, Depart ment of Computer Science, University of California, Los Angeles. [21] C. Alaettinoglu, A. U. Shankar, K. Dussa-Zieger, I. Matta, Design and Implementation of MaRS: A Routing Te stbed*, Department of Computer Science, University of Maryland, 1993. [22] P. Danzig, Z. Liu, L. Yan, An Evalua tion of TCP Vegas by Live Emulation, Computer Science Department, Univ ersity of Southern California. [23] Davies, N., G. S. Blair, A. Friday, A. D. Cross and P. F. Raven "Mobile Open Systems Technologies For The Utilities Industries", Remote Cooperation CSCW for Mobile and Teleworkers, Ed A. Dix, Springer Verlag, 1996. [24] M. Allman, A. Caldwell, S. Ostermann, ONE: The Ohio Network Emulator, Techincal Report TR-19972, Ohio Un iversity Computer Science, 1997. [25] x-kernel Home Page, http://www.cs.arizona.edu/xkernel

PAGE 63

55 [26] M. Carson, D. Santay, NIST Net A Linux-based Network Emulation Tool, National Institute of Standa rds and Technology (NIST), 2003. [27] L. S. Brakmo, L. L. Peterson, Experiences with Network Simulation, University of Arizona, 1996. [28] RedHat 9 Manual, http://www.redhat.com/docs/manuals/linux/RHL-9-Manual [29] Web100 at ORNL, http://www.csm.ornl.gov/~dunigan/netperf/web100.html [30] The FreeBSD Project, http://www.freebsd.org [31] FreeBSD Handbook, http://www.freebsd.org/doc/en_US.ISO88591/books/handbook [32] A. Popa, BSDnews >> Issue #2 >> Us ing Dummynet for Traffic Shaping on FreeBSD, http://www.bsdnews.org/02/dummynet.php [33] P. Sarolahti, A. Kuznetsov, Congestion Control in Linux TCP, In Proceedings of Usenix 2002/Freenix Track, pp. 49-62, CA, USA, June 2002. [34] Creating RPMs, http://erizo.ucdavis.edu/~dm k/notes/RPMs/Creating_RPMs.html [35] GRUB, http://www.redhat.com/docs/ma nuals/linux/RHL-7.2-Manual/refguide/ch-grub.html

PAGE 64

56 Bibliography [1] K. Wehrle, F. Pahlke, H. Ritter, D. Mu ller, M. Bechler, The Linux Networking Architecture: Design and Implementation of Network Protocols in the Linux Kernel, Pearson Education, Inc., 2005. [2] Red Hat Linux 9: Red Hat Linux Customization Guide, URL: http://www.redhat.com/docs/manuals/li nux/RHL-9-Manual/custom-guide/ [3] J. F. Kurose, K. W. Ross, Comput er Networking: A Top-Down Approach Featuring the Internet, Pearson Education, Inc., 2003.

PAGE 65

57 Appendices

PAGE 66

58 Appendix A: Source Code for Implemen tation of SF-SACK in the Linux 2.4.26 Kernel SOCK.H . /* TCP SFSACK structure */ struct { unsigned long tau; /* used for sampling rate and cwnd calculation */ __u32 current_cwnd; __u32 last_cwnd_sample; unsigned long last_drop_time; unsigned long drop_interval; __u32 cwnd_sample; int not_first_drop; /* flag that indicates if first drop event has occurred */ int drop_event; /* indicates if a drop event has occurred */ unsigned long sock_id; /* used for debugging purposes */ __u32 cwnd_down_limit; int cwnd_decreasing; /* determines if cwnd is currently decreasing or not */ } sfsack; struct timer_list sfsack_cwnd_timer; /* END TCP SFSACK CODE */ .

PAGE 67

59 Appendix A: (Continued) SYSCTL_NET_IPV4.C . {NET_TCP_LOW_LATENCY, "tcp_low_latency", &sysctl_tcp_low_latency, sizeof ( int), 0644, NULL, &proc_dointvec}, {NET_IPV4_IPFRAG_SECRET_INTERVAL, "ipfrag_secret_interval", &sysctl_ipfrag_secret_interval, sizeof ( int), 0644, NULL, &proc_dointvec_jiffies, &sysctl_jiffies}, {NET_TCP_WESTWOOD, "tcp_westwood", &sysctl_tcp_westwood, sizeof ( int), 0644, NULL, &proc_dointvec}, /*----SFSACK CODE----*/ {NET_TCP_SFSACK, "tcp_sfsack", &sysctl_tcp_sfsack, sizeof ( int), 0644, NULL, &proc_dointvec}, {NET_TCP_SFSACK_TAU, "tcp_sfsack_tau", &sysctl_tcp_sfsack_tau, sizeof ( int), 0644, NULL, &proc_dointvec_jiffies, &sysctl_jiffies}, /*----END SFSACK CODE----*/ .

PAGE 68

60 Appendix A: (Continued) NETSYMS.C . EXPORT_SYMBOL(tcp_sendpage); EXPORT_SYMBOL(sysctl_tcp_low_latency); /*----SFSACK CODE----*/ EXPORT_SYMBOL(sysctl_tcp_sfsack); EXPORT_SYMBOL(sysctl_tcp_sfsack_tau); /*----END SFSACK CODE----*/ .

PAGE 69

61 Appendix A: (Continued) TCP_TIMER.C . static void tcp_write_timer( unsigned long); static void tcp_delack_timer( unsigned long); static void tcp_keepalive_timer ( unsigned long data); static void tcp_sfsack_cwnd_timer ( unsigned long data); /*SFSACK CODE*/ . void tcp_init_xmit_timers( struct sock *sk) { struct tcp_opt *tp = &sk->tp_pinfo.af_tcp; init_timer(&tp->retransmit_timer); tp->retransmit_timer.function=&tcp_write_timer; tp->retransmit_timer.data = ( unsigned long) sk; tp->pending = 0; init_timer(&tp->delack_timer); tp->delack_timer.function=&tcp_delack_timer; tp->delack_timer.data = ( unsigned long) sk; tp->ack.pending = 0; init_timer(&sk->timer); sk->timer.function=&tcp_keepalive_timer; sk->timer.data = ( unsigned long) sk; /*----SFSACK CODE----*/ sysctl_tcp_sfsack_tau = TCP_SFSACK_TAU 2; tp->sfsack.tau = TCP_SFSACK_TAU 2; // make sure tau is set if(sysctl_tcp_sfsack == 1){ init_timer(&tp->sfsack_cwnd_timer); tp->sfsack_cwnd_timer.function=&tcp_sfsack_cwnd_timer; /* tp->sfsack_cwnd_timer.expires = tp->sfsack.tau; */ /* tp->sfsack_cwnd_timer.expires = TCP_SFSACK_TAU; */ tp->sfsack_cwnd_timer.data = ( unsigned long) sk; if (!timer_pending(&tp->sfsack_cwnd_timer)) { tp->sfsack_cwnd_timer.expires = jiffies + TCP_SFSACK_TAU; add_timer(&tp->sfsack_cwnd_timer); } } /*----END SFSACK CODE----*/ }

PAGE 70

62 Appendix A: (Continued) . /*----SFSACK CODE----*/ static void tcp_sfsack_cwnd_timer ( unsigned long data) { struct sock *sk = ( struct sock*)data; struct tcp_opt *tp = &(sk->tp_pinfo.af_tcp); int ret = 0; unsigned long glob_tcp_sfsack_tau = TCP_SFSACK_TAU; ret = tcp_sfsack_cwnd(tp); if(!mod_timer(&tp->sfsack_cwnd_timer, jiffies + TCP_SFSACK_TAU)) sock_hold(sk); } /*----END SFSACK CODE----*/ . void tcp_clear_xmit_timers( struct sock *sk) { struct tcp_opt *tp = &sk->tp_pinfo.af_tcp; tp->pending = 0; if (timer_pending(&tp->retransmit_timer) && del_timer(&tp->retransmit_timer)) __sock_put(sk); tp->ack.pending = 0; tp->ack.blocked = 0; if (timer_pending(&tp->delack_timer) && del_timer(&tp->delack_timer)) __sock_put(sk); /* -SFSACK DELETE TIMER -*/ if (timer_pending(&tp->sfsack_cwnd_timer) && del_timer(&tp->sfsack_cwnd_timer)) __sock_put(sk); /* -End SFSACK DELETE TIMER */ if(timer_pending(&sk->timer) && del_timer(&sk->timer)) __sock_put(sk); } .

PAGE 71

63 Appendix A: (Continued) TCP.C . int sysctl_tcp_mem[3]; int sysctl_tcp_wmem[3] = { 4*1024, 16*1024, 128*1024 }; int sysctl_tcp_rmem[3] = { 4*1024, 87380, 87380*2 }; int sysctl_tcp_sfsack = 0; /* SFSACK default is off */ int sysctl_tcp_sfsack_tau = 100; /* SFSACK 100 ms default */ .

PAGE 72

64 Appendix A: (Continued) TCP_INPUT.C . void tcp_enter_loss( struct sock *sk, int how) { struct tcp_opt *tp = &sk->tp_pinfo.af_tcp; struct sk_buff *skb; int cnt = 0; WEB100_UPDATE_FUNC(tp, web100_update_congestion(tp, 0)); /* Reduce ssthresh if it has not yet been made inside this window. */ if (tp->ca_state <= TCP_CA_Disorder || tp->snd_una == tp->high_seq || (tp->ca_state == TCP_CA_Loss && !tp->retransmits)) { tp->prior_ssthresh = tcp_current_ssthresh(tp); if (!(tcp_westwood_ssthresh(tp))) { tp->snd_ssthresh = tcp_recalc_ssthresh(tp); } } /*----SFSACK CODE----*/ if (!(tcp_sfsack_timeout(tp))) { tp->snd_cwnd = 1; } /*----END SFSACK CODE----*/ .

PAGE 73

65 Appendix A: (Continued) static void tcp_cwnd_down( struct tcp_opt *tp) { int decr = tp->snd_cwnd_cnt + 1; __u32 limit; /* TCP Westwood Here limit is evaluated as BWestimation*RTTmin (for obtaining it in packets we use mss_cache). If sysctl_tcp_westwood is off tcp_westwood_bw_rttmin() returns 0. In such case snd_ssthresh is still used as usual. It prevents other strange cases in which BWE*RTTmin could assume value 0. It should not happen but... */ /* --SFSACK CODE --*/ if (sysctl_tcp_sfsack != 0) { if (tp->sfsack.cwnd_decreasing == 1){ limit = tp->sfsack.cwnd_down_limit; } else{ tp->sfsack.cwnd_sample = tp->snd_cwnd >> 1; if (tp->sfsack.not_first_drop == 1){ limit = tcp_sfsack_calc_cwnd(tp); } else{ limit = tp->snd_ssthresh/2; tp->sfsack.not_first_drop = 1; tp->sfsack.last_drop_time = jiffies; tp->sfsack.last_cwnd_sample = tp>sfsack.cwnd_sample; } tp->sfsack.cwnd_down_limit = limit; tp->sfsack.cwnd_decreasing = 1; } } /* --END SFSACK CODE --*/ . static __inline__ void tcp_complete_cwr( struct tcp_opt *tp) { if (!(tcp_sfsack_complete_cwr(tp))) { //SFSACK CODE if (!(tcp_westwood_complete_cwr(tp))) { tp->snd_cwnd = min(tp->snd_cwnd, tp->snd_ssthresh); } } tp->snd_cwnd_stamp = tcp_time_stamp; }

PAGE 74

66 Appendix A: (Continued) . static __inline__ void tcp_cong_avoid( struct tcp_opt *tp) { . tp->snd_cwnd = min(tp->snd_cwnd, (__u32)tp->snd_cwnd_clamp); /* --SFSACK CODE --*/ if(sysctl_tcp_sfsack != 0){ tp->sfsack.cwnd_decreasing = 0; } /* --END SFSACK CODE --*/ tp->snd_cwnd_stamp = tcp_time_stamp; } .

PAGE 75

67 Appendix A: (Continued) TCP.H . #define TCP_KEEPALIVE_TIME (120*60*HZ) /* two hours */ #define TCP_KEEPALIVE_PROBES 9 /* Max of 9 keepalive probes */ #define TCP_KEEPALIVE_INTVL (75*HZ) /*SFSACK CODE*/ #define TCP_SFSACK_TAU (( unsigned )(50*HZ/1000)) /* 50msec timer for sfsack_tau */ /*END SFSACK CODE*/ . extern int sysctl_tcp_low_latency; extern int sysctl_tcp_westwood; extern int sysctl_tcp_sfsack; //sfsack code extern int sysctl_tcp_sfsack_tau; //sfsack code ... frequency of sampling for cwnd in sfsack . #ifdef TCP_CLEAR_TIMERS if (timer_pending(&tp->delack_timer) && del_timer(&tp->delack_timer)) __sock_put(sk); #endif #ifdef TCP_CLEAR_TIMERS if (timer_pending(&tp->sfsack_cwnd_timer) && del_timer(&tp->sfsack_cwnd_timer)) /*SFSACK CODE */ __sock_put(sk); #endif . /*----SFSACK CODE----*/ static inline __u32 tcp_sfsack_calc_cwnd( struct tcp_opt *tp) { double d_cwnd; double alpha, beta; __u32 cwnd;

PAGE 76

68 Appendix A: (Continued) tp->sfsack.drop_interval = jiffies tp->sfsack.last_drop_time; tp->sfsack.current_cwnd = tp->snd_cwnd; if (tp->sfsack.drop_interval <= tp->sfsack.tau){ alpha = (2.0 ( double )tp->sfsack.tau) / (( double )tp>sfsack.drop_interval 1.0); beta = (2.0 ( double )tp->sfsack.tau) / (( double )tp>sfsack.drop_interval + 1.0); } else{ alpha = 3.0; beta = 5.0; } d_cwnd = (alpha / beta) ( double )tp->sfsack.current_cwnd; d_cwnd = d_cwnd + ((1.0 / beta) (( double )tp->sfsack.cwnd_sample + (double )tp->sfsack.last_cwnd_sample)); if (d_cwnd < 0){ d_cwnd = 0; printk(KERN_INFO "TCP_SFSACK ERROR: d_cwnd < 0"); } tp->sfsack.last_drop_time = jiffies; tp->sfsack.last_cwnd_sample = tp->sfsack.cwnd_sample; cwnd = (__u32)d_cwnd; return cwnd; } static inline int tcp_sfsack_cwnd( struct tcp_opt *tp) { int ret = 0; __u32 cwnd; if (sysctl_tcp_sfsack != 0) { if (tp->sfsack.not_first_drop != 0) { tp->sfsack.drop_interval = jiffies tp>sfsack.last_drop_time; tp->sfsack.current_cwnd = tp->snd_cwnd; if (tp->sfsack.drop_event != 1){ tp->sfsack.cwnd_sample = tp->snd_cwnd; } if (tp->sfsack.drop_event == 1){

PAGE 77

69 Appendix A: (Continued) tp->snd_cwnd = tcp_sfsack_calc_cwnd(tp); } tp->sfsack.last_cwnd_sample = tp->sfsack.cwnd_sample; ret = 1; } tp->sfsack.drop_event=0; } return ret; } static inline int tcp_sfsack_complete_cwr( struct tcp_opt *tp) { int ret = 0; if (sysctl_tcp_sfsack == 1) { tp->sfsack.cwnd_sample = tp->snd_cwnd>>1; tp->sfsack.drop_event = 1; if (tcp_sfsack_cwnd(tp)) { tp->snd_ssthresh = tp->snd_cwnd; ret = 1; printk(KERN_INFO "ret = %d\n", ret); } else { tp->sfsack.last_drop_time = jiffies; } if (tp->sfsack.not_first_drop != 1) { tp->sfsack.last_cwnd_sample = tp->sfsack.cwnd_sample; } tp->sfsack.not_first_drop = 1; tp->sfsack.drop_event = 0; } return ret; } static inline int tcp_sfsack_timeout( struct tcp_opt *tp) { int ret = 0; if (sysctl_tcp_sfsack == 1) { tp->sfsack.cwnd_sample = 1; tp->sfsack.drop_event = 1; if (tcp_sfsack_cwnd(tp)) { ret = 1;

PAGE 78

70 Appendix A: (Continued) } else { tp->sfsack.last_drop_time = jiffies; } if (tp->sfsack.not_first_drop != 1) { tp->sfsack.last_cwnd_sample = tp->sfsack.cwnd_sample; } tp->sfsack.not_first_drop = 1; tp->sfsack.drop_event = 0; } return ret; } /*----END SFSACK CODE----*/ #endif /* _TCP_H */

PAGE 79

71 Appendix A: (Continued) TCP_IPV4.C . static int tcp_v4_init_sock( struct sock *sk) { . sk->sndbuf = sysctl_tcp_wmem[1]; sk->rcvbuf = sysctl_tcp_rmem[1]; /*----SFSACK CODE----*/ tp->sfsack.not_first_drop = 0; tp->sfsack.tau = TCP_SFSACK_TAU 2; tp->sfsack.sock_id = jiffies; /*----END SFSACK CODE----*/ . } .