I'd like to tell you about quite an obvious, it seems to me, way to filter DNS Amplification Attack and about a module that was written to implement the idea.
Many have encountered DNS Amplification Attack; some had more luck confronting the problem, some less. The attack itself is implemented by sending DNS query to a DNS server with a spoofed source IP address which is the same as that of the victim. DNS response is almost always bigger than the query, especially if to take into consideration the fact that the attacker usually makes ANY query. AAAA records are not rare, there is SPF and other information in TXT records, and it is relatively easy to achieve amplification in five and more times. It looks very tempting for the attacker as they can implement a decent DOS even without large botnet. One can spend a long time dwelling on the question why IP spoofing is still possible but it is a fact, and this is why it's so important to make it very difficult for attackers to use your DNS servers. It's worth noting that attacks affect both authoritative DNS servers and public resolvers. The solution described below can be used in both cases.Main methods of tackling DNS Amplification Attacks:
There's nothing complicated here. Many firewalls have a feature which allows to block the traffic if the quantity of packets per second has exceeded certain number. If you'd rather avoid using firewall on DNS servers, you can run tcpdump once in a while, parse its output and forward unwanted traffic to /dev/null via routing. In some cases it's acceptable to add IP address of the attacker to loopback interface (this method was recommended by I.Sysoyev on one of conferences as a way to avoid using firewall on FreeBSD). You can also set mirroring traffic on switch, analyse it somewhere else separately and send the result to a border router for blocking traffic. There are many options but they all have a downside: we are bound to loose some traffic. Let's not forget that we block the spoofed IP which can belong to anyone from provider's DNS servers to servers of the company you work for.
The header of DNS packet has TC flag field. If TC flag is set then the client must repeat query using TCP, all other fields of the answer will be ignored. The idea of the method is that the attacker won't use TCP to repeat query, it does not make any sense for them to do so, whereas a genuine client will. Of course TCP for DNS is a slower solution but firstly the response must be cached by recursive DNS or client, and secondly some latency in this case is not so bad when you compare it to full unavailability of DNS server. This method has already been implemented on some DNS servers: in Powerdns you can set up TC flag when responding to ANY queries, which is a good compromise. Although this variant is also not ideal. There are still servers in the Internet which do not completely follow RFC or are wrongly configured. So, just setting the TC flag for all queries does not guarantee everyone will repeat their queries via TCP. Another thing to remember is that although setting TC flag in the DNS application will reduce the quantity of outgoing traffic and hence reduce bandwidth through network equipment but the DNS server will still need to process a lot of incoming packets, wasting context switching and warming up data centers.What else can be done?
I wanted to suggest a new solution. There are no ideal solutions when you deal with DOS and this one is also not perfect. For example, it does not protect the server from being used to mirror DNS traffic without amplification. Nevertheless, here are some advantages:
I also aimed to offer as simple a solution as possible in terms of usage, so that usage of iptables, configuring additional rules etc is not necessary. The module is Linux specific but it should be possible to implement similar things for FreeBSD, too.
How does it work?
We count the number of packets that came in from each IP address in a stated period of time. If the number of packets from a certain IP exceeds the threshold, we form UDP response with TC flag and drop the query. Hence we at once reduce context switching caused by necessity to process traffic by DNS server application. Upon receiving TC flagged response via UDP, a legitimate client will need to repeat their query via TCP and then the traffic will reach the DNS server.
The structure of the header for DNS query and response are the same. Moreover, header is the only part of a packet that is necessary for DNS answer to be considered correct. All this will help us to effectively implement the described approach.
Another good point is that DNS header has a standard size of 12 bytes. Therefore the scheme is very simple, we even do not need to parse the whole DNS header. We ensure that there are more then 12 bytes of data in the packet that came to 53 UDP port, we copy the first 12 bytes of the query into a new packet (on writing this, it occurred to me that perhaps it is also worth checking other fields of the header), set TC bit and answer bit to it and send it back. As we have copied only the header, it is advisable to also zeroed QDCOUNT field, otherwise we'll get parse warnings on client side. Then the query should be deleted. All this can be done straight in NF_INET_LOCAL_IN hook. Also we have to put the source's IP to KFIFO queue to count statistic. We will be counting the incoming packets' statistics asynchronously in a separate thread by using red-black tree. This way the extra latency caused will be minimal because KFIFO is a lock free data structure and the queue is being created for each CPU. Although it is becoming necessary to configure the interval in accordance with expected PPS. There is also a limit on memory size allocated for per CPU data: now it is 32 kB, and considering this fact we create a queue of 4096 IP addresses for each CPU. Having chosen a 100 ms interval, we are able to count up to 40960 PPS for each CPU, which is enough for most cases. On the other hand queue overflow will lead to some data being lost in terms of statistics only.
Here comes a logical question: why not simply use hash instead?
Unfortunately, inaccurate use of hash in such places open up vulnerabilities for other type of attacks, namely, collision attacks. If the attacker finds that hash is used in some of time-critical part of code, they can choose the data which will be processed in hash table not as O(1) but as O(n). This type of attacks is difficult to detect: it appears that nothing bad has happened but the server is down.
If PPS from the blocked IP got below the threshold, blocking is released. It is possible to configure hysteresis which is by default equal to 10% of threshold number.
The link to the project is given at the end of the article. Any constructive comments are welcome.
insmod ./kfdns.ko threshold=100 period=100 hysteresis=10
in the directory with the new module.
threshold is the threshold which will cause TC flag to be set up if exceeded;
period is count period in ms (here the filter will be activated if we receive more than 100 packets from one IP in 100 ms);
hysteresis is the difference between threshold of filter activating and threshold of filter release. A hint: if you set up hysteresis=threshold then after blocking is activated it will never be released. This can be of use in some cases.
Once you have loaded the module you can see the statistics on Ips subjected to filtration.
In order to create parasite load I used dnsperf (two of them: one on the virtual machine; the other one on the notebook; unfortunately it wasn't even enough to fully load the system). DNS server was in KVM under CentOS, with pdns-recursor used as the DNS server.
The graphics show the values on the counters before and after module is loaded. During the whole experiment PPS was at 80 kpps level.
The outgoing traffic had been reduced – and this is what we wanted to achieve. As we can see, after the vodule had been activated, the outgoing traffic had become even less than the incoming one which is actually logic: let us not forget that we copy only the header.
Reduction of context switching happens at once. Very good.
This is what was happening to the system: system time and user time usage are notably reduced. Changes in steal time are caused by the virtualisation so their appearance is also logical. As for a slight increase of irq time, it is quite interesting and can be targeted in future experiments.
What would I add in the future?
For now the project is a young one but I hope it may be of interest and use to some of you.
Further reading and references: