Traffic shaping and bandwidth management.
This article is from the perspective of a small network, possibly a couple of people sharing one connection. It will explain traffic shaping and bandwidth management in general and with linx. It (sort of) walks through building a traffic shaping router well under 25$.
Why share
---------
The easy answer is: "Since double-speed connection does not cost double". This is already good. Even better is the fact that most of the time the whole connection can be used only by yourself.
On the image I have my bandwidth usage graphs of May 12. 2004. As you can see, it's bursty (as most of the internet is. bursty). Also plain in sight is the fact that most of the time the bandwidth isn't completely in use.

When there are collisions (multiple hosts using the net on the same time), it very seldom fills the whole link. If it does, however, both get the same bandwidth (you can see it near 800min), or better yet, the traffic with higher priority gets an advantage. But more on that later.
System requirements
-------------------
You don't need much. A P100 with 32MB RAM would do. Actually, some people prefer 486, because those can be run without a fan on the processor (less noise). I haven't actually tried a traffic-shaping router on one of those (just router/firewalls). I would imagine they propably work ok anyway.
You should propably have a HDD (a couple of hundreds of megabytes) for the software (although you propably could use a cd). Again, silence-freaks try to load most or all of the info to RAM, so that the system would be more quiet. Go for it, if you want to and can do it.
We'll use linux (i use Debain, but any distro would do (obviously, you might want to steer clear from Mandrake, Suse, Red Hat and the like, since they suck! I mean, they might be too bloated for this thing)) kernel 2.4.* (Get a recent one, okay, since HTB support hasn't been in forever). You need iptables and iproute2 packages for firewalling/bandwidth management. Other stuff like ssh server, too, but thats obvious, isn't it?
Such setup will be very stable. My boxes have uptime in months and are usually booted for a new kernel or because I think I'm logged into some other box and accidentally reboot the wrong machine or my neighbours getting pissed about the music and cutting the powerlines. In any case, they sruvive pretty hectic loads as well.
Starting to build
-----------------
This section is just about basic stuff, sort of vague step-by-step guide. It is not in detail, nor the best way to do things. Its just to give some idea about what should be done - to get the router up and running. If you already know how to do that, you propably want to do it your own (better) way.
To get debian running on an old box I usually download the installation disks and boot from them, configure network interface(s), do a FTP install, get ssh server running and lilo booting without hellish delays (also, if using ext2, set the init scripts not to wait forever for the root password, in case file system has been damaged). After that, I strip the monitor and keyboard and dump the rest on the network.
Now, you defininitely want to place the router someplace where it can actually do something (so that you can test it). I put mine on my network border and laughed into peoples faces when they complained about problems that I caused by fucking around. This is very much fun, so I suggest you do the same.
To get the router routing, you need to have both of your network interfaces up and working. Something like this will do the trick:
cat 1 > /proc/sys/net/ipv4/ip_forward
Should you need NAT functionality, this might come in handy as an example:
iptables -A PREROUTING -t nat -i eth0 -o eth1 -j DNAT --to your_net_ip
For security considerations, you should set the FORWARD chain policy to drop and specifically allow certain subnets/interfaces.
iptables -P FORWARD DROP
iptables -A FORWARD -i eth0 -o eth1 --source 192.168.0.1/24 -j ACCEPT
iptables -A FORWARD -i eth1 -o eth0 --destination 192.168.0.1/24 -j ACCEPT
It is a good idea to put them in some script that will run every reboot.
On a debian box, ip_forward is read from '/etc/network/options' and you can do '/etc/init.d/iptables save active' to save your current firewall configuration as default.
If you need anything like DHCP, it wont't be a problem.
Idea of bandwidth management
----------------------------
Bandwidth management can be done for different reasons. What we want from our router is that it would take traffic priority into account (so that huge downloads wouldn't suffocate web browsing or online gaming) and be fair (on a shared connection, you might desire that everyone gets the same bandwidth - after all you are all paying the same money).
To accomplish this, queues and classes are used. Before a packet is sent out to the internet, it is somewhat analyzed and based on a set of conditions, might end up on one of the many queues. These queues are connected to a hierarchy of classes, forming something like a tree. After landing in it's final destination (final destination on the tree, as its real final destination is some other box) the packet moves up the hierarchy until reaches the interface and is sent out.

This image will be more explained from different angle under the linux kernel topic.
Technically, traffic shaping can be used on both incoming and outgoing packets, but is typically implemented just on the outgoing. This is because you can't control what is sent to you and there is little point in building an incoming queue and keeping packets in there even though they could be dealt with right away (remember, outgoing packets can leave only at the connection speed and queue up, incoming packets have already come in at connection speed). Just to give a complete picture, sometimes incoming packets are dropped to reduce the speed of some connections (this is called policying). Usually, shaping incoming packets is done indirectly through the outgoing queues (by scheduling TCP ACK packets).
As we have a router, we don't have to deal with this. As we don't really get any packets for the router itself, everything that is received is sent out of the other end. This makes things easier for us, since we can actually shape incoming traffic when it wants to leave through the other interface.
Usually you want things to be fair - you all pay the same money so everyone should get the same slice of connection, unless some of it is left over (in which case you would propably want to equally distribute it among everybody who needs it). This is pretty easy to accomplish.
All you need to do is create classes for every client. Unless you have like tens or hundreds of them, you can do it by hand. You specify the bandwidth respectively to the amount they are paying. Should you specify the ceiling (how much the cclass can borrow bandwidth) to the full capacity of the link, any bandwidth left over would be distributed among those who can use it (usually a good idea).
Now, this was fair handling of bandwidth. It's a good thing to do, but it gets better. Have you ever thought that while downloading something big, that it would be so nice if the downloading didnt make surfing the web so slow, or ssh connections even slower? Can be arranged. Lets talk abot types of traffic.
There are several types of traffic on any network and not all have the same characteristics, the same needs. So, what types of traffic do we have? There is interactive traffic (ssh, telnet, ftp-control, maybe irc and muds.. everything where it is important to have little delay.. web surfing is also interactive) and there is bulk traffic (ftp-data, p2p, downloading from the web). You can also treat "really-minimal-delay-traffic" differently (interactive might be better regarded to as "human-interactive") - like dns requests and tcp ACK packets, maybe some ICMP types. The usual way of doing general (and already pretty effective bandwidth management) is prioritizing them in such order:
minimal-delay - interactive - bulk
Whenever there are minimal-delay packets, they are sent first. If there are none, interactive packets are sent and if there are none of those either, bulk traffic is passed through (this is what roughly happens. actually, all these classes would have their guaranteed bandwidth, so that a whole shitload of ACK packets could not stop web surfing).
I'll explain why this is good. There are little ACK and DNS packets and they dont use much bandwidth, but quite often other services wait for these (to get a web page, you need to get it's ip first, with DNS), so it would be wise to let them in front of other stuff (keeping in mind, that because of little bandwidth usage, they wont slow anything down either). Now your interactive traffic - there's usually more of it - you propably would like it if the web pages would be loaded in 2 seconds instead of 8. On saturated links, prioritizing it over bulk transfers can do that. And your bulk transfers themselves - this prioritizing doesnt really affect them. You will get those web pages anyway, and those ssh sessions. You will definitely use all the bandwidth you need for interactive traffic and it doesn't make any difference, wether some packets of a huge file were sent before or after a http GET request.
For example, i had eMule and Soulseek running on one of the boxes, that completely filled the uplink. This meant that ssh'ing to the router/server meant more than a second of delay for everything, every keystroke. Not my idea of fun. So, i added a new class to the bandwidth management system for ssh and gave it priority over everything else. This reduced the delay to a couple of hundred milliseconds. This is a pretty normal round-trip time with the particular network.
Also, a con of some magnitude, you have to create a bottleneck for bandwidth management to work. If you are not the slowest link in the chain, queues will grow somewhere else (your ADSL modem, ISP's hardware) and you wont be able to prioritize the traffic. This would mean, that for a 640/256 connection you should propably turn it down to 624/240 to be sure that you indeed are the one shaping the traffic. If your link speed is relatively stable, you could try to cut it back less, but the literature generally suggests 90-95% of maximum speed.
This is particularily important for DSL users. You may have noticed that uploading absolutely kills any interactivity whatsoever. This is because DSL modems usually have a VERY BIG buffer to get better throughput. This buffer however takes very long time to run through. I did a test on a saturated uplink, first without limiting bandwidth and then with a limit. Ping jumped from 650ms ping to 280ms. Now, after i gave icmp priority over other traffic, it went down to 96ms. For a complete picture, with an empty uplink, the ping is 50ms. Downlink was emptish during the test (there was some traffic because of the upload).
Kernel bandwidth management
---------------------------
First i'll explain some terminology. You propably need to check back here later in the chapter, since this theoretical talk isn't very clear just by itself.
Qdisc - queueing discipline. These are used to give traffic classes different behaviour. Linux kernel supports many different qdiscs, all of which operate in different ways and are for different situations. We will use only two.
Class - some qdiscs are classful and can have a hierarchy - sort of a tree under them. These trees compose of classes.
Filter - a filter examines a packet and sends it somewhere down the (or possible up, left, right, etc) hierarchy.
HTB qdisc - Hierarchical token bucket. This is a classful qdisc, which means it can have "subclasses and sub-qdiscs". It is useful to send different packets to different branches of your qdisc-tree - branches that have different priority and bandwidth. These subclasses that you can crate are able to share bandwidth that is left over, amongst eachother (all children of the same class can borrow bandwidth from their siblings, but not from any other class).
SFQ - Stohastic fairness queue. This only has an effect when a link is full. What it does, is try to spearate each "connection" and give each of them equal share of the available bandwidth. It does not affect the traffic at all, when the link isn't full.
In the kernel, each qdisc gets a "major number", and each class gets a "minor number". For example, the root qdisc could be 1:0, where 1 is the major (qdisc) number and 0 is the minor (class) number (class 0 means qdisc itself in linux kernel). Since all classes belong to a qdisc, they are referenced as 1:10, where 1 is the major number (the qdisc) and 10 is a class (of this specific qdisc. all qdiscs can have a class 10 - 1:10, 2:10, 10:10, etc). Qdiscs can be referenced either as 1:0 or just 1:.
So, all interfaces (NICs) have a root qdisc by default (0:). When you delete your root qdisc, a default one is put in its place (all children and filters are removed), so, there is always somekind of traffic shaping in place.
For somesort of prelimenary example, lets say you have two machines and you want them both to get the same bandwidth, period. For this, you could add a new HTB qdisc to root (1:0), add a class to it (1:1), that has the rate of your connection, let's say 256kbit, and two subclasses (1:10 and 1:15, under 1:1) for the machines with rates 128kbit. To the leaf subclasses you would attach SFQ qdiscs (10:0, 15:0) so that the bandwidth would separate equally among the connections from a particular machine.

This brings us to filters. Filters can be attached to any class or qdisc to direct where the packet should go. In our case, we would insert two filters to qdisc 1:0 that would put the packets to 10: or 15: depending on its ip. Let's say, 192.168.0.1 goes to 10: and 192.168.0.2 goes to 15:. Note that filters can send packets straight from root to leaf-classes (filters do not depend on the hierarchy in any way) as well as to an intermediate class.
So, it is sort of like a tree.
Prioritizing example
--------------------
To put all this information into use, let's build a simple traffic shaper for ourselvs, shall we? Let's shape the internet connection of my DSL line and first build separate classes for SSH, HTTP and OTHER SHIT - a prioritizing setup.
Creating and modifying qdiscs, classes and filters is done with the "tc" tool from iproute2 package. First let's see if the default qdisc is in place (ie, we can start from scratch).
teener:~# tc -s -d qdisc show dev eth0
qdisc pfifo_fast 0: Unknown qdisc, optlen=20
Sent 2098548844 bytes 3720776 pkts (dropped 0, overlimits 0)
Here it is. So, what we do first is create the root qdisc and the main class, under which to connect other classes.
teener:~# tc qdisc add dev eth0 root handle 1:0 htb default 15
teener:~# tc class add dev eth0 parent 1:0 classid 1:1 htb rate 240kbit
Running 'man tc' or 'tc help' or 'tc qdisc help' or 'tc qdisc add htb help' can give you the definition of tc's syntax, but i'll explain the meaning of it as well.
'tc qdisc add' adds a qdisc (or a class or a filter), as i suppose you might have guessed. 'dev eth0' specifies the network interface to work with (as most routers have many). The next argument 'parent 1:0' specifies the parent in the hierarchy. 'root' is just a special parent and means the absolute top of hierarchy. 'handle 1:0' and 'classid 1:1' give a name to the qdisc or class respectively. Lastly, you define a type of class or qdisc, in this case htb. Everything after this point is arguments for the qdisc or class.
Htb argument 'default 15' means that if no filters send the traffic elsewhere, it should be sent to subclass 15. 'rate 240kbit', as you propably guessed, describes the bandwidth of this class, in kilobits per second.
Now, let's create subclasses for all the different kinds of traffic we plan do discriminate in one way or another.
teener:~# tc class add dev eth0 parent 1:1 classid 1:5 htb rate 72kbit ceil 240kbit prio 2
teener:~# tc class add dev eth0 parent 1:1 classid 1:10 htb rate 120kbit ceil 240kbit prio 6
teener:~# tc class add dev eth0 parent 1:1 classid 1:15 htb rate 48kbit ceil 240kbit prio 10
1:5 is for SSH, 1:10 is for HTTP and 1:15 is for everything else (the default class we defined above). New htb argument 'ceil 240kbit' means how much (to the total of) the class can borrow from it's siblings. Because there is no reason (in this case) to keep unused bandwidth, we let the classes borrow whatever is left over. 'prio n' gives class a priority. Packets from classes with smaller prio number are sent out first, whenever the interface is ready to transmit another packet. I used this somewhat strange numbering (2, 6, 10) to allow for later insertion of other classes - before, after and between current ones.
Next, we add qdiscs to the leaf classes. We will be using the sfq qdisc, which gives fairness on saturated links.
teener:~# tc qdisc add dev eth0 parent 1:5 handle 5:0 sfq perturb 10
teener:~# tc qdisc add dev eth0 parent 1:10 handle 10:0 sfq perturb 10
teener:~# tc qdisc add dev eth0 parent 1:15 handle 15:0 sfq perturb 10
The sfq argument 'perturb 10' means that it will reconfigure it's hashing (sfq uses hashing to distribute connections among queues, since it has a theoretically unlimited (at least very large number of) connections to divide between limited number of queues) every 10 seconds, so that if some queue got more connections than another this unfairness wouldn't stay for long.
Now all we have left are the filters
teener:~# tc filter add dev eth0 protocol ip parent 1:0 prio 1 u32 match ip dport 22 0xffff flowid 1:5
teener:~# tc filter add dev eth0 protocol ip parent 1:0 prio 1 u32 match ip sport 22 0xffff flowid 1:5
teener:~# tc filter add dev eth0 protocol ip parent 1:0 prio 1 u32 match ip dport 80 0xffff flowid 1:10
For every filter you have to define what protocol it will be filtering. This is the 'protocol ip' bit (remember, tcp, udp, icmp - all belong to ip). The 'prio n' works in a similar manner as the htb prio - filters with smaller prio numbers are tried first. The 'u32' is the main filter used (I also use fw that matces packets marked with iptables).
It's arguments - 'match ip' allows us to match by ip field names (it is possible to just mach some specific bits at some offset in the packet), '(d/s)port xx 0xffff' checks the destination or source port respectively, '0xffff' being the mask (we want port to be exacly 22 or 80, so the mask has all bits up). 'flowid M:m' tells the filter where to send a matched packet (classid of a previously defined class).
Fair example
------------
Now, maybe try a fair setup, discriminating by ip address. Let's see whats going on
teener:~# tc -s -d qdisc show dev eth0
... (output removed) ...
qdisc htb 1: r2q 10 default 15 direct_packets_stat 32847 ver 3.13
Sent 5236631433 bytes 7201014 pkts (dropped 15171, overlimits 7790900)
backlog 45p
In my case, the old shaping setup is still there, so I have to remove it before trying something new
teener:~# tc qdisc del dev eth0 root
Now, as the last time, we create a root qdisc and a class under it (the class is important for burrowing - all direct children of a CLASS can share bandwidth).
teener:~# tc qdisc add dev eth0 root handle 1:0 htb default 15
teener:~# tc class add dev eth0 parent 1:0 classid 1:1 htb rate 240kbit
Add classes and leaf qdiscs for all the resident boxes and one for guest boxes (1:15)
teener:~# tc class add dev eth0 parent 1:1 classid 1:11 htb rate 48kbit ceil 240kbit
teener:~# tc class add dev eth0 parent 1:1 classid 1:12 htb rate 48kbit ceil 240kbit
... (removed some input) ...
teener:~# tc qdisc add dev eth0 parent 1:14 handle 14:0 sfq perturb 10
teener:~# tc qdisc add dev eth0 parent 1:15 handle 15:0 sfq perturb 10
I set all the classes to rate 48kbit, but you could of course have limits specific to boxes.
Now, to include more scenarios, let's do the filters with the help of iptables, by marking packets. So, the iptables rules for this
teener:~# iptables -I OUTPUT -t mangle -o eth0 --source 194.204.48.52 -j MARK --set-mark 1
teener:~# iptables -I FORWARD -t mangle -i eth2 -o eth0 --source 192.168.0.2 -j MARK --set-mark 2
teener:~# iptables -I FORWARD -t mangle -i eth2 -o eth0 --source 192.168.0.3 -j MARK --set-mark 3
teener:~# iptables -I FORWARD -t mangle -i eth2 -o eth0 --source 192.168.0.4 -j MARK --set-mark 4
As you can guess, we can mark packets by any match available in iptables. This means a pretty wide array of interesting things (connection tracking, MAC addresses, odd things like STRING matches). Anyways, the first rule is for the router itself, because in my case it works as a part-time server and therefore has some traffic as well.
The matching set of tc filters
teener:~# tc filter add dev eth0 protocol ip parent 1:0 prio 1 handle 1 fw flowid 1:11
teener:~# tc filter add dev eth0 protocol ip parent 1:0 prio 1 handle 2 fw flowid 1:12
teener:~# tc filter add dev eth0 protocol ip parent 1:0 prio 1 handle 3 fw flowid 1:13
teener:~# tc filter add dev eth0 protocol ip parent 1:0 prio 1 handle 4 fw flowid 1:14
The 'handle x' part, new in context of filters, has different meanings depending on the type of filter. With fw filter it means the value of mark. 'flowid M:m' directs the traffic to appropriate classes.
Final comments
--------------
The two examples above could be quite easilly combined to suit most situations for smaller (and possibly some larger) networks. To combine fairness and prioritizing, you first create the classes for different boxes (or classes of boxes) and then treat those classes as they were the main classes in the prioritizing example. As a result, you should get a tree with 5 branches at the root, each one with 3 branches of its own. There is no really effective way to do this, so you will just have to do some iterative shell scripting or develop functions that add different templates under classes. Or you can hand-tailor it for perfection.
Before actually building a real traffic shaping solution, you should propably profile your bandwidth usage. See what applications are used and what are their requirements in terms of bandwidth (reliability, throughput, delay) and how to 'detect' them (source/destination ports and TOS fields should cover almost any case). Just tinkering and adjusting things all around the place could work, but any serious IT person will happily preach the importance of analysis and documentation and abiding project and whatnot. And they are propably right too, so do that.
I do hope this text was helpful in developing some understanding of bandwitdh management with linux and in general as well. I personally believe traffic shaping is a wonderful way to get the link running smoother and more effective. Since it has little drawbacks (having to create a bottleneck, an extra piece of hardware laying around) and pretty neat payoffs, it's definitely something that should belong on your network.
Links
-----
http://lartc.org/lartc.html - "Linux Advanced Routing & Traffic Control HOWTO"
http://luxik.cdi.cz/~devik/qos/htb/manual/userg.htm - "HTB Linux queuing discipline manual - user guide"
http://qos.ittc.ukans.edu/howto/index.html - "Linux - Advanced Networking Overview"
http://www.google.com/ - "Google - Searching 4,285,199,774 web pages"
16.05.2004