Fixing home ADSL latency under load with fq_codel
About the same time my Billion ADSL modem (5+ years old) went on the blink, I was listening to the Packet Pushers Podcast: Improve your home internet performance using CoDel and figured it sounded worth experimenting with fq_codel to see how it would impact my home internet performance.
The gist is that fq_codel is an algorithm to deal with the issues associated with Buffer Bloat whereby excess buffering cases high latency spikes and poor overall network performance.
So I started looking at replacement ADSL routers for home that would support fq_codel. After a little research I was looking at getting an basic ADSL modem (eg: a TP-Link TD-8817 or W8968 or similar) running in bridge mode and then putting a router running DD-WRT or OpenWRT behind it to provide the fq_codel capability - there didn't seem to be many ADSL routers that natively supported fq_codel.
After looking around for an router that I could install OpenWRT on I ended up finding the Ubiquity EdgeRouter X which supports fq_codel out of the box. So I went the easy path and picked one up so I wouldn't have to go through the hassle of getting a router and flashing it with OpenWRT.
To start with I replaced the modem with a TP-Link W8968 ADSL modem/router and configured it so that it was working as a router (with WiFi disabled).
Latency Test Scripts
To see the effects of buffer bloat I used two methods:
the speed tests from DSLReports which measure latency (and give a bufferbloat score)
a little script I wrote that just runs a ping test and then every 5 seconds starts an upload and download task using curl. Then I just manually monitor the ping response time. Nothing fancy, just looks like:
ping www.internode.on.net &
sleep 5
echo Starting first stream
curl -s -o /dev/null http://speedcheck.cdn.on.net/50meg.test -w "%{speed_upload}" &
curl --connect-timeout 8 -F "file=@10meg.test" http://128.199.65.191/webtests/ul.php -w "%{speed_upload}" -s -o /dev/null &
sleep 5
echo Starting second stream
curl -s -o /dev/null http://speedcheck.cdn.on.net/50meg.test -w "%{speed_upload}" &
curl --connect-timeout 8 -F "file=@10meg.test" http://128.199.65.191/webtests/ul.php -w "%{speed_upload}" -s -o /dev/null &
sleep 5
echo Starting third stream
curl -s -o /dev/null http://speedcheck.cdn.on.net/50meg.test -w "%{speed_upload}" &
curl --connect-timeout 8 -F "file=@10meg.test" http://128.199.65.191/webtests/ul.php -w "%{speed_upload}" -s -o /dev/null &
Testing with no fq_codel
In the configuration using just the TP-Link W8968 with a laptop connected directly to one of its switch ports (ie: without any fq_codel support enabled), I got results like:
which wsn't too bad - the latency went from 30ms to 90ms under load. Where I really noticed it was when I ran the test script, latency for the pings reached as high as 3 seconds. Not very helpful for my voip phone call quality!
Testing with fq_codel
After this, I moved to the desired configuration where the TP-Link modem was in bridge mode connected to the EdgeRouter-X with fq_codel enabled. After tuning the fq_codel settings - upload 1390kbps, download 13800kbps (by picking upload and download numbers close to what you get without limits and then work up/down from there while testing to see outcome), I was able to get results like this (note the A+ for bufferbloat):
But the real test was when under load using the script, the latency looks like:
PING www.internode.on.net (150.101.140.197): 56 data bytes
64 bytes from 150.101.140.197: icmp_seq=0 ttl=57 time=46.533 ms
64 bytes from 150.101.140.197: icmp_seq=1 ttl=57 time=46.341 ms
64 bytes from 150.101.140.197: icmp_seq=2 ttl=57 time=46.162 ms
64 bytes from 150.101.140.197: icmp_seq=3 ttl=57 time=46.399 ms
64 bytes from 150.101.140.197: icmp_seq=4 ttl=57 time=46.082 ms
Starting first stream
64 bytes from 150.101.140.197: icmp_seq=5 ttl=57 time=46.765 ms
64 bytes from 150.101.140.197: icmp_seq=6 ttl=57 time=94.550 ms
64 bytes from 150.101.140.197: icmp_seq=7 ttl=57 time=60.278 ms
64 bytes from 150.101.140.197: icmp_seq=8 ttl=57 time=59.467 ms
64 bytes from 150.101.140.197: icmp_seq=9 ttl=57 time=70.207 ms
Starting second stream
64 bytes from 150.101.140.197: icmp_seq=10 ttl=57 time=113.590 ms
64 bytes from 150.101.140.197: icmp_seq=11 ttl=57 time=64.240 ms
64 bytes from 150.101.140.197: icmp_seq=12 ttl=57 time=81.501 ms
64 bytes from 150.101.140.197: icmp_seq=13 ttl=57 time=81.129 ms
64 bytes from 150.101.140.197: icmp_seq=14 ttl=57 time=74.700 ms
Starting third stream
64 bytes from 150.101.140.197: icmp_seq=15 ttl=57 time=128.408 ms
64 bytes from 150.101.140.197: icmp_seq=16 ttl=57 time=66.781 ms
64 bytes from 150.101.140.197: icmp_seq=17 ttl=57 time=77.045 ms
64 bytes from 150.101.140.197: icmp_seq=18 ttl=57 time=65.625 ms
64 bytes from 150.101.140.197: icmp_seq=19 ttl=57 time=57.672 ms
Note that there is a spike in the latency for one ping immediately after the stream starts, but the router quickly adjusts and the latency comes back down to 20-30ms higher than what it was originally - with 3 full rate upload and download streams running simultaneously. This is compared to the 3 seconds I was getting prior to enabling fq_codel!
Admittedly, I had to give up some of the pure top end download speed to get the lower latencies (I did try higher upload/download settings for fq_codel and was able to get closer to the original speed, but the closer I got the higher the latency went). Now, though, I don't have to worry if I'm streaming tv or downloading stuff when the voip phone rings as the latency stays low. The latency will have more user impact on a day to day basis rather than a slight loss of top end speed so happy to make the tradeoff.
If you haven't looked at how buffer bloat and latency are impacting your connection, I'd suggest it is worthwhile running a couple of tests and you may also find it worthwhile investigating fq_codel for your connection.