Tuesday Technical Tweet Thread Time! Let's go on the roller coaster of what happens at a low level when a DNS server sends an 4,000 byte EDNS0 response to a client whose MTU is 1200 bytes. Confused already? don't worry, we'll break it down. I promise it's super interesting.
-
-
Satellites, Radio systems, things like POS (Packet over Sonet), ATM, DSL, Token Ring, and more. These could all have different MTUs that actually limit how much data you could send or receive in one packet.
Show this thread -
The underlying reasons are usually esoteric, like how long you're allowed to send signals for at the electrical level. Anyway, main point: different mediums = different MTUs. MTU stands for "Maximum Transmission Unit".
Show this thread -
Different networks can be linked though, and you can only get a packet through all of the links if it's <= the lowest MTU of any of the links, so you need a way to discover the whole paths MTU. We'll come back to this, I promise. Placeholder for now.
Show this thread -
There's also a minimum MTU that *any* IP network has to support. It's 576 bytes, everything has to handle that. So DNS traditionally takes a shortcut around path MTU discovery and just says "lets use UDP and just stick to the minimum".
Show this thread -
So it says "Requests and responses have to be be smaller than 512 bytes", which when you add the 20 byte UDP header, and the 20 byte IP header, leaves 24 bytes of space for IP options. So it all fits.
Show this thread -
Fitting a DNS request into 512 bytes isn't too hard, it's why DNS has a limit on how big domain names can be. But since humans have to type them sometimes, long ones would always be a pain. No big deal.
Show this thread -
Fitting an entire DNS response into 512 bytes was easy to start out with .. just a few IPs, but it's gotten harder over time. Email, Anti-Spam measures, IPv6, and DNSSEC (which is garbage, but that's a different topic) have all pushed the size of responses up and up.
Show this thread -
DNS has a built in mechanism to handle this, called truncation. If the response that a server needs to send is too big, it sends as much as it can and then marks a bit that means "This response was truncated".
Show this thread -
The requester then retries the request over TCP, instead of UDP. Because TCP is intended for "bigger" messages. Two problems with this: sometimes people block TCP DNS without realizing, and TCP *still* has to figure out what the path MTU is.
Show this thread -
So here's how path MTU discovery works for TCP. The TCP connection starts with its best guess of what the MTU is, and calculates a "Maximum Segment Size" from this. This isn't the TCP window. It's the maximum data that TCP will put in a single packet.
Show this thread -
The client sends SYN, the server sends SYN|ACK, usual TCP handshake. These packets are small and rarely trigger path MTU discovery. For DNS; the request packet will be small too and likely won't trigger anything.
Show this thread -
But when the server sends the response, that will be big, maybe too big. But it sends it out anyway. That packet gets as far as it gets. If it gets to a link that has a smaller MTU, that link sends back an Internet Control Messsage Protocol (ICMP) message that says "MTU exceeded"
Show this thread -
It sends to this back to the sender ... the DNS server in this case. And so the sender has a clue what even triggered the error, it includes the top part of the packet that triggered the error.
Show this thread -
Basically the server tried to send a long letter by relay mail, and got to a relay that only has small envelopes. So that relay tore the original letter in half and send back that half in a small envelope saying "I only have small envelopes, so try again, good luck".
Show this thread -
O.k. so now the sender knows a new lower MTU for that path, and it caches it. The kernel keeps a cache of all MTUs it knows for all destinations. Now it recomputes a new TCP MSS and resends the data but in smaller packets. Yay! This is part of why TCP is reliable.
Show this thread -
This whole process takes a damn while though; the client had to fall back to TCP, and then the path mtu discovery had to happen, and finally the client gets the response ... as long as TCP wasn't blocked to begin with.
Show this thread -
So the DNS folks are like "Wouldn't it be great if we could just send large responses over UDP". And it's true. It would. And so it was invented, as part of EDNS0, a general extension mechanism for DNS.
Show this thread -
With EDNS0 enabled, the client can include a little fake sort of record that says "hey, I support EDNS0, and also I'm good if you send me large UDP responses ... up to 4,000 bytes, say".
Show this thread -
If a server also supports EDNS0, it can just send larger UDP responses, rather than truncating and falling back to TCP. We're now nearing the top of the rollercoaster.
Show this thread -
What happens when we send a large UDP response? Well as we saw earlier the UDP and IP headers both have length fields. The way it works out is that fragmentation actually happens at the IP layer ...
Show this thread -
Suppose we try to send a 4,000 byte UDP datagram over a 1500 byte MTU IP network, what happens is that it gets broken into 3 IP "fragments". The first contains the first part of the UDP message, including the UDP header itself, and the next packets follow on from there.
Show this thread -
Each fragment will have the same IP "identification" header (IPID) and different fragment offsets. It's the kernel's job to wait for all of the fragments to show up and pass them on to an application.
Show this thread -
Path MTU discovery works the same as with TCP. If the first fragment is too big, the sender will get an error. But since applications usually don't retry UDP responses, it can also just show up as that message being entirely dropped.
Show this thread -
It also means that fragments of the message look strange to many parts of the network, like firewalls and switches. UDP packets without UDP headers! How are they supposed to know whether the packet is allowed or not? How is flow-switching supposed to work? The internet shrugs.
Show this thread -
We often end up building in the ability to relate these packets to one another in many places, and since that's expensive, they also have to be rate-limited. It's deeply messy.
Show this thread -
That's what's going on down there though, and understanding that full picture is key to diagnosing some common network mysteries. O.k. I have a team meeting now, more later!
Show this thread -
Meeting postponed! o.k. so what happens when the DNS server sends a 4K response and the MTU is 1200 bytes? Well the DNS server gets the error, and it fixes the *next* response, but that first is lost. Super annoying. So the client has to retry, but then it works.
Show this thread -
Some more fun: network paths don't have to be symmetrical, so MTUs don't either. If the client needs to send a large amount of data, this whole process happens for them too. A sender and a receiver can legitimately end up with different limits towards one another.
Show this thread -
Because MTU discovery depends on state, and on ICMP messages being allowed, some folks do something like "MSS clamping" where for TCP they have the network actually meddle with the TCP connection (a little) to offer a different MSS to the other end.
Show this thread -
New conversation -
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.