* Mikrotik's ROS v7 uses an older kernel, and out-of-date implementation of CAKE. XDP based hardware acceleration is not possible. It's also quite buggy.
* A middle-box x86 device running LibreQoS can put through up to 4Gbps of traffic per CPU core, allowing you to shape by Site or by AP up to 4Gbps in capacity, with aggregate throughput of 10Gbps or more. On ARM-based MikroTik routers, the most traffic you can put through a single HTB and CPU core is probably closer to 2Gbps. HTBs suffer from queue locking, where CPU use will look as if all CPU cores are evenly balancing the load, but in reality, a single Qdisc lock on the first CPU core (which handles scheduling of the other cpu threads) will be the bottleneck of all HTB throughput. The way LibreQoS works around that qdisc locking problem is with XDP-CPUMAP-TC, which uses XDP and MQ to run a separate HTB instance on each CPU core. That is not available on MikroTik. Heirarchical queuing is bottle-necked on Mikrotik in this way.
* Routing on the same device which applies CPU-intensive queues such as fq-codel and CAKE will greatly increase CPU use, limiting throughput and introducing more latency and jitter for end-users than would be seen using a middle-box such as LibreQoS.
## Why not just use Preseem or Paraqum?
* Preseem and Paraqum are great commercial products - certainly consider them if you want the features and support they provide.
* That said, the monthly expense of those programs could instead be put toward the active development of CAKE and fq_codel, the AQMs which are the underlying algorithms that make Preseem and Paraqum possible. For example, Dave Täht is one of the leadign figures of the bufferbloat project. He currently works to improve implementations of fq_codel and CAKE, educate others about bufferbloat, and advocate for the standardization of those AQMs on hardware around the world. Every dollar contributed to Dave's patreon will come back to ISPs 10-fold with improvements to fq_codel, CAKE, and the broader internet in general. If your ISP has benefited from LibreQoS, Preseem, or Paraqum, please [contribute to Dave's Patreon here.](https://www.patreon.com/dtaht) Our goal is to get Dave's patreon to $5000 per month - so he can focus on CAKE and fq_codel full-time, especially on ISP-centric improvements. Just 50 ISPs contributing $100/month will make it happen.
CAKE and fq_codel are hybrid packet scheduler and Active Queue Management (AQM) algorithms. LibreQoS uses a Hierarchical token bucket (HTB) to direct each customer's traffic into its own queue, where it is then shaped using either CAKE or fq_codel. Each customer's bandwidth ceiling is controlled by the HTB, according to the customer's allocated plan bandwidth, as well as the available capacity of the customer's respective Access Point and Site.
The impact of fq\_codel on a 3000Mbps connection vs hard rate limiting —
a 30x latency reduction.
>“FQ\_Codel provides great isolation... if you've got low-rate videoconferencing and low rate web traffic they never get dropped. A lot of issues with IW10 go away, because all the other traffic sees is the front of the queue. You don't know how big its window is, but you don't care because you are not affected by it. FQ\_Codel increases utilization across your entire networking fabric, especially for bidirectional traffic... If we're sticking code into boxes to deploy codel, don't do that. Deploy fq\_codel. It's just an across the board win.”
* Qdisc locking problem limits throughput of HTB used in v0.8 (solved in v0.9). Tested up to 4Gbps/500Mbps asymmetrical throughput using [Microsoft Ethr](https://github.com/microsoft/ethr) with n=500 streams. High quantities of small packets will reduce max throughput in practice.
* Linux tc hash tables can only handle [~4000 rules each.](https://stackoverflow.com/questions/21454155/linux-tc-u32-filters-strange-error) This limits total possible clients to 1000 in v0.8.
* [XDP-CPUMAP-TC](https://github.com/xdp-project/xdp-cpumap-tc) integration greatly improves throughput, allows many more IPv4 clients, and lowers CPU use. Latency reduced by half on networks previously limited by single-CPU / TC QDisc locking problem in v.0.8.
* Tested up to 10Gbps asymmetrical throughput on dedicated server (lab only had 10G router). v0.9 is estimated to be capable of an asymmetrical throughput of 20Gbps-40Gbps on a dedicated server with 12+ cores.
* Each Node / Access Point is tied to a queue and CPU core. Access Points are evenly distributed across CPUs. Since each CPU can usually only accomodate up to 4Gbps, ensure any single Node / Access Point will not require more than 4Gbps throughput.
* Not dual stack, clients can only be shaped by IPv4 address for now in v0.9. Once IPv6 support is added to [XDP-CPUMAP-TC](https://github.com/xdp-project/xdp-cpumap-tc) we can then shape IPv6 as well.
* XDP's cpumap-redirect achieves higher throughput on a server with direct access to the NIC (XDP offloading possible) vs as a VM with bridges (generic XDP).
* Can now shape by Site, in addition to by AP and by Client
#### Considerations
* If you shape by Site, each site is tied to a queue and CPU core. Sites are evenly distributed across CPUs. Since each CPU can usually only accomodate up to 4Gbps, ensure any single Site will not require more than 4Gbps throughput.
* If you shape by Acess Point, each Access Point is tied to a queue and CPU core. Access Points are evenly distributed across CPUs. Since each CPU can usually only accomodate up to 4Gbps, ensure any single Access Point will not require more than 4Gbps throughput.
#### Limitations
* As with 0.9, not yet dual stack, clients can only be shaped by IPv4 address until IPv6 support is added to [XDP-CPUMAP-TC](https://github.com/xdp-project/xdp-cpumap-tc). Once that happens we can then shape IPv6 as well.
* XDP's cpumap-redirect achieves higher throughput on a server with direct access to the NIC (XDP offloading possible) vs as a VM with bridges (generic XDP).
* Network heirarchy can be mapped to the network.json file. This allows for both simple network heirarchies (Site>AP>Client) as well as much more complex ones (Site>Site>Micro-PoP>AP>Site>AP>Client).
* Graphing of bandwidth to InfluxDB. Parses bandwidth data from "tc -s qdisc show" command, minimizing CPU use.
* Graphing of TCP latency to InfluxDB - via [PPing](https://github.com/pollere/pping) integration.
#### Considerations
* Any top-level parent node is tied to a single CPU core. Top-level nodes are evenly distributed across CPUs. Since each CPU can usually only accomodate up to 4Gbps, ensure any single top-level parent node will not require more than 4Gbps throughput.
#### Limitations
* As with 0.9 and v1.0, not yet dual stack, clients can only be shaped by IPv4 address until IPv6 support is added to [XDP-CPUMAP-TC](https://github.com/xdp-project/xdp-cpumap-tc). Once that happens we can then shape IPv6 as well.
* XDP's cpumap-redirect achieves higher throughput on a server with direct access to the NIC (XDP offloading possible) vs as a VM with bridges (generic XDP).
There is a rudimentary UISP integration included in v1.1-alpha.
Instead, you may want to use the [RUST-based UISP integration](https://github.com/thebracket/libre_qos_rs/tree/main/uisp_integration) developed by [@thebracket](https://github.com/thebracket/) for v1.1 and above.
[@thebracket](https://github.com/thebracket/) was kind enough to produce this great tool, which maps the actual network heirarchy to the network.json and Shaper.csv formats LibreQoS can use.
* One management network interface, completely seperate from the traffic shaping interfaces. Usually this would be the Ethernet interface built in to the motherboard.
* NIC must have two or more interfaces for traffic shaping.
* NIC must have multiple TX/RX transmit queues. [Here's how to check from the command line](https://serverfault.com/questions/772380/how-to-tell-if-nic-has-multiqueue-enabled).
* Ubuntu Server 21.10 or above recommended. All guides assume Ubuntu Server 21.10. Ubuntu Desktop is not recommended as it uses NetworkManager instead of Netplan.
* v0.9+: Requires kernel version 5.9 or above for physical servers, and kernel version 5.14 or above for VM.
* Choose a CPU with solid [single-thread performance](https://www.cpubenchmark.net/singleThread.html) within your budget. Generally speaking, any new CPU above $200 can probably handle shaping up to 2Gbps.
LibreQoS makes great use of fq\_codel and CAKE - two open source AQMs whose development is led by Dave Täht, and contributed to by dozens of others from around the world. Without Dave's work and advocacy, there would be no LibreQoS, Preseem, or Paraqum.
If LibreQoS helps your network, please [contribute to Dave's Patreon.](https://www.patreon.com/dtaht) Donating just $0.2/sub/month ($100/month for 500 subs) comes out to be 60% less than any proprietary solution, plus you get to ensure the continued development and improvement of CAKE. Dave's work has been essential to improving internet connectivity around the world. Let's all pitch in to help his mission.
Special thanks to Dave Täht, Jesper Dangaard Brouer, Toke Høiland-Jørgensen, Kumar Kartikeya Dwivedi, Kathleen M. Nichols, Maxim Mikityanskiy, Yossi Kuperman, and Rony Efraim for their many contributions to the Linux networking stack. Thank you Phil Sutter, Bert Hubert, Gregory Maxwell, Remco van Mook, Martijn van Oosterhout, Paul B Schroeder, and Jasper Spaans for contributing to the guides and documentation listed below. Thanks to Leo Manuel Magpayo for his help improving documentation and for testing. Thanks to everyone on the [Bufferbloat mailing list](https://lists.bufferbloat.net/listinfo/) for your help and contibutions.
# Made possible by
* [fq_codel and CAKE](https://www.bufferbloat.net/projects/)