Rate-limit 401 Unauthorized responses to prevent abuse/reflection baesd attacks#1588
Rate-limit 401 Unauthorized responses to prevent abuse/reflection baesd attacks#1588e-lisa wants to merge 16 commits intocoturn:masterfrom
Conversation
…ed attacks
Added rate-limiting to 401 Unauthorized responses to prevent abuse of the server
for use in DDoS attacks via traffic reflection and amplification.
This patch works by counting the amount of requests that result in a 401
Unauthorized response and limiting them by IP Address if the occurred in
a specified window of time.
ur_addr_map* functions were extended and wrapped to enable ioa_addr objects
with and without port numbers. In our use-case we need to be able use the
*no_port variants when working with `ioa_addr` types.
Added new command-line options:
--no-ratelimit-401 - Disables rate-limiting of 401 Responses
--ratelimit-401-requests-per-window - Sets the amount of requests that result
a 401 response per ratelimt window
--ratelimit-401-window-seconds - Sets the size in seconds of the rate-limit
window.
jelmd
left a comment
There was a problem hiding this comment.
If I understood the patch correctly, I doubt that it is able to prevent the mentioned DDoS - who says, that a "smart" attacker will always use the same port? Furthermore if there is a Distributed attack, IMHO it is even counterproductive. Also a whole organization/company may sit behind a single firewall, so defaults are probably not reasonable. Last but not least option names are far too long, should be shortened and the request number limit (0 | > 0) should be used to decide, whether to apply a limit or not (the intended option --no-ratelimit-401 is redundant) - not, i.e. req-limit=0 sounds reasonable to me.
Finally for now I think, using tools like fail2ban might be the better option to apply rate limits if one is convinced, that this changes anything or helps somehow.
|
|
||
| addr_list_foreach_del_condition(rate_limit_map, ratelimit_delete_expired); | ||
| TURN_MUTEX_UNLOCK(&rate_limit_main_mutex); | ||
| } |
There was a problem hiding this comment.
Not sure, when this gets called. I would guess by a worker thread and thus all the ratelimit related MUTEXs will block all other workers from getting their work done?
IIRC all the work for the same client gets pushed to the same worker thread. So are those global locks really needed? Wouldn't be in this case a thread local list the better choice?
There was a problem hiding this comment.
The addr map implementation is not thread safe, if two workers tried to modify the map at the same time it could corrupt the data structure.
As for when it is run, just above it:
(ur_addr_map_num_elements(rate_limit_map) >= ADDR_MAP_SIZE) {
It is run when the map size is larger than ADDR_MAP_SIZE (the map is full).
I would be willing to move this to a local lock into the map, if you think it could cause performance issues.
Please let me know either way!
Lets remember that in this situation the diagram of the attack would look as follows: Attacker's Spoofed UDP Packet -> Coturn -> Victim This patch is meant to mitigate coturn's ability to be used to launch DDoS reflection/amplification attacks (not protect the server from a DDoS). By doing this it allows those running coturn to be good netizens by preventing their servers from being used in a malicious way. Let me try to clarify things a little bit and elaborate on what we're trying to do here:
In my opinion it will because: a) The UDP packets are spoofed on behalf of the victim, the 401 response is reflected at the victim
This patch does not use port numbers, but rather the IP address of the source of the UDP packet. This patch does not use the port. Also note the
I assume you have made the D bold here to suggest that the source addresses of the UDP packets would be distributed, however in this case the UDP packet source is spoofed. As far as the coturn server sees, there is only one source address (even if the attack is distributed from multiple sources). This attack would be distributed by using multiple coturn servers to attack a single target (not multiple attackers attacking the coturn service, but rather using it to reflect the 401 requests at a victim). Please also consider that we are not rate limiting all traffic. Only 401 Unauthorized responses. By doing this we should mitigate any unintentional abuse (For example spoofing a target to get them banned/rate limited).
Understood. What do you think better defaults would be? Remember this code only runs when a 401 response is sent. Do we believe there would be more than 100 unauthorized responses in a 60 second period of time. That would be a 401 unauthorized responses every .6 seconds, sustained for a full 60 seconds straight. It should also be noted that we're currently working on an allowlist that would exempt IP addresses from this rate limit. However in my opinion this should be broken up into a second patch (which I will submit when completed).
This makes sense to me. I see no problem making these changes! As far as shortened, what do you think of:
A few things here: a) fail2ban is not really designed to mitigate (D)DoS style attacks, as it periodically reads the logs In closing I will try to address your code review as soon as possible. Thank you for taking the time to properly review the code! |
9a3da18 to
e5d1c48
Compare
around rate limiting. Increased 401 rate limit window to 1000 requests and 120 seconds. Shortened some code for readablity/brevity. Removed cruft. Removed new map functions using *no_port Switched to turn_time()
|
For an immediate small fix I suggest using For some reason this config is an opt-in and not enabled by default.... |
| no_response = 1; | ||
| char raddr[129]; | ||
| addr_to_string_no_port(rate_limit_address, (unsigned char *)raddr); | ||
| TURN_LOG_FUNC(TURN_LOG_LEVEL_INFO, "401 rate limit exceeded from %s, response not sent\n", raddr); |
There was a problem hiding this comment.
Would it make sense to add the remote address to the "Cannot find credentials" log as well, which is shown before the rate limit kicks in?
coturn/src/server/ns_turn_server.c
Line 3475 in f6004a1
There was a problem hiding this comment.
I agree with your logic, when I update the branch to the latest HEAD I will make sure to make this change.
Thank you for the suggestion, it will help us get eyes on where the ratelimit is coming from.
As part of looking at #1588 , I figured that sending `SOFTWARE` attribute is also part of a problem as it increases messages sent out by coturn and thus increasing amplification factor. For 4.6.2, the additional size is 24 bytes (4 bytes attribute header, and 20 bytes for "Coturn-4.6.2 'Gorst'") If we are to use an example from #1588, "A 62 byte request will be met with Coturn’s 401 Unauthorized response which is 150 bytes, a factor of ~2.42." - without SOFTWARE the response will be 126 bytes which reduces amplification factor to ~2. As I observed with multiple providers using coturn - some of the are sending it. Meaning, they do not set `--no-software-attribute` - most probably due to lack of clarity about this setting. I believe sending SOFTWARE_ATTRIBUTE should be off by default which is hinted in the RFC (https://datatracker.ietf.org/doc/html/rfc8489#section-16.1.2) Detailed changes: - Extract setting the attribute into a function to avoid code duplication - This option is now not reloadable - The option is now called `software_attribute` because inverse logic creates multiple double-not in the code which makes it harder to read. - `no-software_attribute` is still functional but marked as deprecated in documentation Test Plan: - Run local tests with different cli arguments (new and deprecated) and confirm SOFTWARE attribute is off by default, and added when arguments say so
|
Any news on this? Today there were several hundrets of coturn servers used for an amplification attack on hosts of a french hosting company. One of our correctly configured servers has been used for this aswell. Our German hosting provider acknowledged that hundrets of servers in their ip space were used for this attack and that they had to enable a manual mittigation. CoTURN is behaving correctly in rejecting unauthorized requests. But this enables amplification attacks if spoofed UDP packets are used... |
|
Hello everyone.... My coturn server was also part of this attack mentioned by @lordwebbie . Please prioritize this PR as the whole world would benefit from this. Thank you. |
|
Same here on hetzner, after years of running quietly. Right now mitigating with blocking various malicous netblocks and with parameters set. But yeah, this PR would definitely help. |
Same here. Thanks for this. 🫶 |
|
@e-lisa Thank you very much for working on this. A couple of questions/clarifications before we try to make more changes: Could this attack also occur with STUN messages (Binding Requests) that don’t require authentication and therefore don’t trigger 401 responses? It seems the attack might work without needing to spoof IP addresses—simply by exploiting STUN backward compatibility. Specifically, there's an attribute in the messages that allows the response to be sent to an arbitrary address: https://datatracker.ietf.org/doc/html/rfc3489#section-11.2.2 The recommendations to disable certain options (which, as far as I know, are disabled by default in the config file) seem like an effective countermeasure 👏 |
|
This PR came from Wire's attempt to mitigate abuses last year (we have been seeing them also for quite a while). On your questions @ggarber - since we authenticate the binding requests, and these were specifically targeted by the attacks, we worked on those. But probably unauthenticated ones are also affected. We have set the following options on our coturns, but the amplification factor of roughly 2.5x is still worth it for the attackers. Also, there is another kind of attack that is not caught by the rate limiting. Attackers are also targeting whole ranges of IP addresses (and not all ports on a given IP address, like in the latest Hetzner incident). |
|
As the developer of this patch, I am still looking at this and am open to suggestions. I will try to merge this branch with the latest code next week so this fix does not become too bitrot as I do think many people can still benfit by running this fix. At the end of the day people are abusing couturn, and anything we can do to stop it is a win for all of us. @mastaab - I think to ratelimit entire netblocks, would be a separate feature, however I don't think it is an unreasonable idea, but probably would build on this work (but not block it). I also think this feature would need its own settings as the you would not want to apply the same rules for a single IP to an entire netblock (or vice versa) |
* feat(coturn): disable software attribute coturn/coturn#1588 * additional flags
|
Sorry if I am being absolutely naive: would it help using a non standard port and in an obscure subdomain? stun.mydomain.com:3478 vs notudp.mydomain.com:4546 |
|
What about Crowdsec to prevent abuse? What are the patterns that should be looked for at syslog?
|
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Copilot <[email protected]>
| ratelimit_init_map(); | ||
| } | ||
|
|
||
| if (ur_addr_map_num_elements(rate_limit_map) >= ADDR_MAP_SIZE) { |
There was a problem hiding this comment.
This goes through the whole map in linear time and counts all the elements, every time
Can it be counted with an additional variable to save on that performance?
|
Sidenote: We've observed some interesting interactions between this patch and nextcloud-talk-recording. Our This behavior should be documented somewhere, as it's extremely hard to debug (very little logs, the application itself not giving helpful output, etc). Also, IMO |
|
@e-lisa can you please add prometheus metrics?
|
|
Happy Anniversary! This critical security PR is now over a year old, and has been in the field, for over a year, defending coturn's more adventurous users. Could we merge this soon, please? |
I can see if I can't fit this in this weekend, should be very useful. |
I honestly have not looked at this. In the wild all I've seen is reports of using 401 responses for amplification. Someone should look into this. CC @ggarber |
FYI, this branch is now updated to the latest master |
|
I have now had a second large network operator where I host coturn point me to this PR, to ensure I had applied it. |
|
I ran into compilation issues with this PR on a system running Debian Trixie and GCC 14.2.0, and another running Arch Linux / GCC 15.2.1. To fix these errors, I had to make the following small changes to Show changesdiff --git a/src/server/ns_turn_maps.c b/src/server/ns_turn_maps.c
index 9f9c43d..c195336 100644
--- a/src/server/ns_turn_maps.c
+++ b/src/server/ns_turn_maps.c
@@ -825,7 +825,7 @@ int addr_list_foreach_del_condition(ur_addr_map *map, ur_addr_map_cond_func func
if (func(elem->value)) {
free((void *)elem->value);
memset(&(elem->key), 0, sizeof(ioa_addr));
- elem->value = NULL;
+ elem->value = (ur_addr_map_value_type)NULL;
count++;
}
}
@@ -838,7 +838,7 @@ int addr_list_foreach_del_condition(ur_addr_map *map, ur_addr_map_cond_func func
if (func(elem->value)) {
free((void *)elem->value);
memset(&(elem->key), 0, sizeof(ioa_addr));
- elem->value = NULL;
+ elem->value = (ur_addr_map_value_type)NULL;
count++;
}
}
diff --git a/src/server/ns_turn_maps.h b/src/server/ns_turn_maps.h
index bc17a3c..795b5bf 100644
--- a/src/server/ns_turn_maps.h
+++ b/src/server/ns_turn_maps.h
@@ -53,6 +53,7 @@ typedef struct _ur_map ur_map;
typedef uint64_t ur_map_key_type;
typedef uintptr_t ur_map_value_type;
+typedef uintptr_t ur_addr_map_value_type;
typedef void (*ur_map_del_func)(ur_map_value_type);
typedef int (ur_addr_map_cond_func)(ur_addr_map_value_type);
@@ -191,8 +192,6 @@ bool lm_map_foreach_arg(lm_map *map, foreachcb_arg_type func, void *arg);
//////////////// UR ADDR MAP //////////////////
-typedef uintptr_t ur_addr_map_value_type;
-
#define ADDR_MAP_SIZE (1024)
#define ADDR_ARRAY_SIZE (4)Hopefully this is useful to anyone else encountering the same problem. I suspect there were some breaking changes in GCC that triggered it. |
Problem:
Attackers are using Coturn's 401 Unauthorized responses with spoofed UDP packets to create a ~2:1 amplification/reflection attack. A 62 byte request will be met with Coturn’s 401 Unauthorized response which is 150 bytes, a factor of ~2.42.
These attacks hurt the performance of Coturn servers as well as their their reputation.
Tickets reporting bulk 401 responses in their logs:
Reports of potential 401 response based reflection attacks in the wild:
Related issue:
Steps to reproduce:
bin/turnutils_uclientwith this config file to generate 401 errors for the COTURN server with the following configuration:.pcapfile withtcpdump,wiresharkor packet capture tool of your choice.Solution:
Added rate-limiting to 401 Unauthorized responses to prevent abuse of the server for use in DDoS attacks via traffic reflection and amplification. This should be the default behavior of Coturn to prevent abuse. An option has been added to disable this feature for debugging.
This patch works by counting the amount of requests that result in a 401 Unauthorized response and limiting them by IP Address if the occurred in a specified window of time.
ur_addr_map* functions were extended and wrapped to enable ioa_addr objects with and without port numbers. In our use-case we need to be able use the *no_port variants when working withioa_addrtypes.Incoming port numbers are ignored by setting the port to
0before storing a copy of theioa_addrobject.Added new command-line options: