CloudCoreRouter and
RouterOS v6.x
Tips and tricks!
Moscow
MUM Russia 2014
1
RouterOS v6 Tile architecture
● First 64bit RouterOS
●Multi Memory Channel
support (faster RAM)
●Hardware Accelerated
Multi-Threading
(no RPS and IRQ needed)
●Hardware Accelerated
Encryption
2
3
4
Yes, still - Packet Flow Diagram
(page 3)
5
6
7
8
Multi-Core Packet Processing
● On receive packet gets assigned to a CPU core
● RouterOS is trying to keep packets from the
same connection assigned to same CPU core
● If CPU core is overloaded single connection
packets will be distributed between cores
● Re-assigning packet from one CPU core to
another is very “expensive” process
● Processing packet on each separate CPU core
might take different amount of time – packet
order might change during the processing 9
Fast Path
● Fast Path allows to forward packets without
additional processing in the Linux Kernel. It
improves forwarding speeds significantly.
● Fast path requirements
– Fast Path should be allowed in configuration
– Interface driver must have support
– Specific configuration conditions
● Currently RouterOS has fast path handlers for:
ipv4 routing, traffic generator, mpls, bridge
● More handlers will be added in future
10
New Throughput test results
11
Throughput in millions pps
12
Traffic Generator Tool
● Traffic Generator is a bandwidth-tool evolution
● Traffic Generator can:
– Determine transfer rates, packet loss
– Detect out-of-order packets
– Collect latency and jitter values
– Inject and replay *.pcap file
– Working on TCP protocol emulation
● “Quick” mode
● Full Winbox support (coming soon)
13
Queuing Changes
● Packets can be placed in queue by any number
of CPU cores, but processed and taken out of
queue only by a single CPU core
● In RouterOS v5.x there was several different
places in packets “life-cycle” where it can be
queued
● In RouterOS v6.x QoS system was redesigned
so that queuing happens is the same place
respectively to other processes in the router.
● Now all queuing happens at the very end of
packet's “life-cycle” in the router 14
HTB in RouterOS v5
15
HTB in RouterOS v6
16
Simple Queues
● Matching algorithm has
been updated
– based on hash
– faster miss-matches
● At least 32 top level
queues are necessary
to fully utilize CCR1036
potential (~9x faster
than single queue)
17
Queue Tree and CCR
● Currently (RouterOS v6.11) only one CPU core
can take packets out from one HTB tree
● We are working on possible update of HTB
algorithm, or introducing completely new
method instead of HTB
● Suggestions:
– Use Interface HTB as much as possible to offload
traffic from HTB “global”
– Use simple queues
18
PPTP,L2TP and PPPoE on CCR
● Changes introduced in v6.8:
– kernel drivers for ppp, pppoe, pptp, l2tp now are
lock-less on transmit & receive
– all ppp packets (except discovery packets) now can
be handled by multiple cores
– MPPE driver now can handle up to 256 out-of-order
packets (Previously even single out-of-order packet
was dropped)
– roughly doubled MPPE driver encryption
performance
19
Single PPTP Tunnel Performance
on CCR1036
in packets per second with 0,01% loss tolerance
20
Single L2TP Tunnel Performance
on CCR1036
in packets per second with 0,01% loss tolerance
21
Single PPPoE Tunnel Performance
on CCR1036
in packets per second with 0,01% loss tolerance
22
CCR and Packet Fragment
● Currently (in RouterOS v6.11) Connection
Tracking required packet to be re-assembled
before further processing
● It is impossible to ensure that all fragments of
the packet is received by the same CPU core
● Process that stores and waits for fragments to
re-assemble nullifies all multi-core benefits
● We plan to
– add full support to Path MTU Discovery to all
tunnels and interfaces
– Update Connection Tracking to handle fragments.
23
Firewall Efficiency
● Each Firewall rule in RouterOS takes a
dedicated place in system memory (RAM)
● CPU need to process a packet through all rules
that packet passes before it is captured by a
rule
● Reducing Average number of rules that packet
need to pass before it is captured can
significantly improve your firewall performance
● Make use of action=jump
● Simplify rules.
24
Changes in the Firewall
● Firewall now has “all-ether”,”all-wireless”,”all-
vlan”,”all-ppp” as possibilities in interface
matching
● Only 2 dynamic “change-mss” mangle rules are
created for “all-ppp” interfaces
● New Mangle Actions “snif-tzsp”,”snif-pc” to send
packet stream to remote sniffer.
25
Layer-7
● Layer-7 is the most “expensive” firewall option,
it takes a lot of memory and processing power
to match each connection to regexp string.
● Layer-7 should be used only on traffic that can't
be identified any different way
● Layer-7 should be used only as trigger - use
connection-mark or address-list to keep track of
related packets or connections
● Do not use direct action (like accept, drop) in
Layer-7 rule
26
Routing and CCR
● Packet routing can utilize all cores
● All dynamic routing protocols (more precisely -
routing table updates and protocol calculations)
in RouterOS v6.x are limited to a single core.
– One BGP full feed will take 1-3min to load on CCR
– Two BGP full feeds will take 6min to load on CCR
● Try to avoid configurations that continuously
updates routing table
● All routing protocols will be updated to multi-
core for RouterOS v7
27
IPSec and CCR
● Hardware acceleration support for aes-cbc +
md5|sha1|sha256 Authenticated Encryption
with Associated Data (AEAD) was added on
CCR in RouterOS v6.8
● Now CCR1036 can handle 3,2Gbps encrypted
IPSec traffic
– Maintaining ~80% CPU load
– No fragmentation (1470byte packets)
– Many peers (100 separate tunnels)
– AES128 was used
28
Tools
● /system resources cpu
● /tool profile
29
Partitions
● Partition will always allow you to keep one
working copy of RouterOS just one reboot away
and backup configuration before mayor
changes
30
Questions!!!
31