Table of Contents

A quick guide on how to get better WireGuard performance out of the already awesome MikroTik RB5009 router, by playing with the queues and the CPU frequency settings.

A picture of the MikroTik RB5009UPr+S+IN

The router used throughout this article is the RB5009UPr+S+IN and it is being as a WireGuard node.

                                           +------------+                                               
                     +------------+        |            |        +----------------+                     
+---------------+    |            |   WG   |            |   WG   |                |    +---------------+
| iperf3 client +--->|   RB5009   +------->|  Internet  +------->|  Linux router  +--->| iperf3 server |
+---------------+    |            |        |            |        |                |    +---------------+
                     +------------+        |            |        +----------------+                     
                                           +------------+

Low throughput? On a RB5009?

There's no doubt that the RB5009 is a very capable router which would be a great fit for most use-cases, and I've happily been using one in my homelab for quite a few years now. As it has evolved over time, its network setup has as well, and I somewhat recently implemented a site-to-site VPN architecture based on WireGuard, using one of these routers as one of the endpoints, providing a seamless connectivity for downstream entities. This went without hiccups for the most part, but when performing a speed-test over the tunneled link, the throughput was much lower than when tested without going through it. This was very intriguing, especially coming from a good 4-core ARM64 CPU.

A screenshot showing an parallelized iperf3 test hovering around 400Mbps with CPU usage in the 50 to 70 percents.

The MikroTik WireGuard documentation has a great site-to-site configuration example, if you ever need one.

As can be seen in the previous screenshot, the WireGuard performance was a bit all over the place, hovering inconsistently around the 400Mbps mark with a lot of retries and a lot of transmission drops. Meanwhile, neither the CPU or the 1Gbps network link between the two endpoints were at their limit, nor was the iperf3 nodes. Granted, an iperf3 test isn't the best tool for getting a real-world bandwidth benchmark, but it should still represent a best-case scenario and not a worst-case one. More tests, done using a direct WireGuard link between the iperf3 server and its client highlighted that the bottleneck was more than likely the RB5009.

After testing a few classic things on the router to improve the throughput of this tunnel, such as tinkering with the MTU and checking if Fast Track was correctly configured and if disabling it changed anything, to no avail, I decided to broaden my research.

In search of a better queue

After looking around the different statistics that RouterOS exposes, I started investigating the high TX drops that could be seen in the previous screenshot. As it has been taken on the child VLAN interface, I checked for drops on the parent interface, and there were quite a lot of transmission queue drops.

As stated on the MikroTik documentation about Queues, "a queue is a collection of data packets collectively waiting to be transmitted by a network device using a pre-defined structure methodology". Here, in the case of the RB5009, the default queue for the problematic interface is only-hardware-queue. As it seems that packets are being pushed faster than what can transmit the link and can hold the queue, it's time to search for a potential replacement!

One queue that I've seen being recommended for WAN interfaces is multi-queue-ethernet-default, which is a FIFO (first in, first out) system supporting multiple transmit queues, with a default size of 50. A supposedly better one that I came across was FQ-Codel, (for Fair Queuing with Controlled Delay), utilizing randomization to create multiple flows and then fairly share bandwidth for all flows, which was the one recommended in a blog article named Chasing MikroTik CHR bottleneck. Finally, after digging a bit more in the MikroTik Queue documentation, I found a more recent and smarter one named CAKE, for Common Applications Kept Enhanced, which appears to be focused towards user-friendliness and easiness of configuration, all while reportedly staying efficient.

Let's all put them to the test, shall we?

A GIF showing a blonde woman saying 'only one way to find out'.

Testing different queue algorithms

On the MikroTik router, both only-hardware-queue and multi-queue-ethernet-default were already pre-configured by default, so I only added the fq-codel and the cake queue types, using the following commands:

/queue type
add kind=fq-codel name=fq-codel-ethernet-default
add cake-bandwidth=1000.0Mbps cake-nat=yes kind=cake name=cake

Once added, it was time to test them all. The following results have been obtained by averaging 5 consecutive 10-seconds iperf3 tests with parallelization set to 4 streams, during a time period where the network was very lightly solicited.

The full iperf3 command used was iperf3 -c <server IP> -P 4 -t 10.

only-hardware-queuemulti-queue-ethernet-defaultfq-codelcake
Upload369.2Mbps405.4Mbps853Mbps789.8Mbps
Retries1201.6997.2221.8283
Overall CPU usage40-50%40-50%60-75%65-75%
Overall consistencyPoorPoorVery goodGood

Please take the CPU usage with a grain of salt, it doesn't directly indicate that different queues consume more CPU, as it's likely caused by the router being able to push more WireGuard traffic, and more CPU-bound WireGuard traffic = more CPU usage.

Based on these results, it seemed like an fq-codel queue was a good fit for my use-case: pushing as much WireGuard traffic as I could. But all this time, I only tested uploading, what about the download speed?

[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.03  sec  73.9 MBytes  61.8 Mbits/sec  392            sender
[  5]   0.00-10.00  sec  70.8 MBytes  59.3 Mbits/sec                  receiver
[  7]   0.00-10.03  sec  55.8 MBytes  46.6 Mbits/sec  313            sender
[  7]   0.00-10.00  sec  52.9 MBytes  44.4 Mbits/sec                  receiver
[  9]   0.00-10.03  sec  57.6 MBytes  48.2 Mbits/sec  331            sender
[  9]   0.00-10.00  sec  54.5 MBytes  45.7 Mbits/sec                  receiver
[ 11]   0.00-10.03  sec  57.5 MBytes  48.1 Mbits/sec  268            sender
[ 11]   0.00-10.00  sec  53.6 MBytes  45.0 Mbits/sec                  receiver
[SUM]   0.00-10.03  sec   245 MBytes   205 Mbits/sec  1304             sender
[SUM]   0.00-10.00  sec   232 MBytes   194 Mbits/sec                  receiver

200Mbps? Over a 1Gbps line? Yikes! There must me another bottleneck somewhere else!

Crank it to the max!

After some more digging, I came across a MikroTik forum post complaining about a Disappointing WireGuard performance on a RB5009 in early 2025.

One of the latest messages posted on this thread was talking about manually setting the CPU speed from its default auto to the maximum allowed, here being 1400MHz. The reasoning behind this is that WireGuard is not hardware-accelerated, at least not on MikroTik devices, and as such is purely CPU-bound with encryption speed directly correlating with the CPU frequency. As the RB5009 has its CPU frequency set to auto by default, it should automatically scale up and down based on the usage, but there seems to be an issue with the RB5009's CPU frequency scaling down too aggressively, which seems to dramatically hurt the router's WireGuard performance.

After cross-checking this with other forum posts to see if other people also obtained similar results or if they encountered any issue afterwards, I decided to give this theory a try.

Let's manually set the RB5009 CPU frequency to its maximum to check if it is the culprit here.

A GIF from back to the future showing a hand increasing a circular knob named 'Overdrive'

Although there shouldn't be any risk associated with always running the RB5009's CPU at its maximum stock clock speed, the following commands are shared for education purposes only. I cannot be held responsible if your router breaks, catches fire, starts forwarding packets faster than light speed or if it gains sentience.

/system/routerboard/settings/set cpu-frequency=1400MHz

Pressing enter returns an error message: failure: not allowed by device-mode.

Ah yes, the newly introduced device mode! As an additional security layer, we need to enable the routerboard device mode before being allowed to modify the CPU frequency. At this point, if we run the /system/device-mode/print command on the router, we should see the following lines:

mode: advanced
routerboard: no

If your router is not yet set to the advanced mode, you will need to run the /system/device-mode/update mode=advanced command to set it accordingly.

Let's set the routerboard flag to yes, then:

/system/device-mode/update routerboard=yes

And press the physical reset button on the router to confirm our choice, as instructed.

Once rebooted, we can retry setting the CPU frequency to 1400MHz, which now returns Warning: cpu not running at default frequency instead of the update: turn off power or reboot by pressing reset or mode button in 4m37s to activate changes message we had before. Perfect! A quick check with the /system/routerboard/settings/print command confirms the fact that the router is now running with a manually set CPU frequency.

After running the RB5009 at its maximum frequency for a few weeks, I haven't noticed any sign of instability or any increase in CPU temperature, which still hovers around 40 to 45°C.

Let's re-test our download speed, still with the fq-codel queue active:

[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.02  sec   248 MBytes   208 Mbits/sec  431            sender
[  5]   0.00-10.00  sec   245 MBytes   205 Mbits/sec                  receiver
[  7]   0.00-10.02  sec   325 MBytes   272 Mbits/sec  695            sender
[  7]   0.00-10.00  sec   321 MBytes   269 Mbits/sec                  receiver
[  9]   0.00-10.02  sec   229 MBytes   192 Mbits/sec  527            sender
[  9]   0.00-10.00  sec   226 MBytes   190 Mbits/sec                  receiver
[ 11]   0.00-10.02  sec   180 MBytes   150 Mbits/sec  458            sender
[ 11]   0.00-10.00  sec   177 MBytes   148 Mbits/sec                  receiver
[SUM]   0.00-10.02  sec   982 MBytes   822 Mbits/sec  2111             sender
[SUM]   0.00-10.00  sec   969 MBytes   812 Mbits/sec                  receiver

And we're basically maxing out the link between both hosts, quite an improvement over the 200Mbps obtained earlier!

But if the CPU frequency matters that much, how could we know that changing the queue was a good idea after all? Let's re-run the queues tests done earlier:

only-hardware-queuemulti-queue-ethernet-defaultfq-codelcake
Upload852Mbps870Mbps883Mbps881.8Mbps
Retries19432363.432.8
Overall CPU usage55-65%55-65%60-70%60-70%
Overall consistencyGoodGoodVery goodVery good

Based on these results, both fq-codel and cake seem to yield a more consistent throughput, with both a low retry count and a higher bandwidth utilization, and both seem like decent choices for my situation. Just to make sure, I ran a constant speed-test over the WireGuard tunnel to make sure that it wouldn't fall flat under sustained load, which it didn't:

A screenshot of a WinBox byte graph showing a full 1Gbps utilization.

As shown in the screenshot, the transmission was rock solid, using the full 1Gbps link, and without dropping any packet! Based on these results, I chose to keep using the cake queue with the CPU frequency manually set to its maximum. I've been using this configuration for a few weeks now, without any issue so far, and it has even been used for transferring multiple terabytes of data without breaking a sweat, thus making it a great success!

Thank you for reading this article, and see you in the next one!