And here's a UDP header ...pic.twitter.com/dmAJqYul1l
You can add location information to your Tweets, such as your city or precise location, from the web and via third-party applications. You always have the option to delete your Tweet location history. Learn more
DNS has a built in mechanism to handle this, called truncation. If the response that a server needs to send is too big, it sends as much as it can and then marks a bit that means "This response was truncated".
The requester then retries the request over TCP, instead of UDP. Because TCP is intended for "bigger" messages. Two problems with this: sometimes people block TCP DNS without realizing, and TCP *still* has to figure out what the path MTU is.
So here's how path MTU discovery works for TCP. The TCP connection starts with its best guess of what the MTU is, and calculates a "Maximum Segment Size" from this. This isn't the TCP window. It's the maximum data that TCP will put in a single packet.
The client sends SYN, the server sends SYN|ACK, usual TCP handshake. These packets are small and rarely trigger path MTU discovery. For DNS; the request packet will be small too and likely won't trigger anything.
But when the server sends the response, that will be big, maybe too big. But it sends it out anyway. That packet gets as far as it gets. If it gets to a link that has a smaller MTU, that link sends back an Internet Control Messsage Protocol (ICMP) message that says "MTU exceeded"
It sends to this back to the sender ... the DNS server in this case. And so the sender has a clue what even triggered the error, it includes the top part of the packet that triggered the error.
Basically the server tried to send a long letter by relay mail, and got to a relay that only has small envelopes. So that relay tore the original letter in half and send back that half in a small envelope saying "I only have small envelopes, so try again, good luck".
O.k. so now the sender knows a new lower MTU for that path, and it caches it. The kernel keeps a cache of all MTUs it knows for all destinations. Now it recomputes a new TCP MSS and resends the data but in smaller packets. Yay! This is part of why TCP is reliable.
This whole process takes a damn while though; the client had to fall back to TCP, and then the path mtu discovery had to happen, and finally the client gets the response ... as long as TCP wasn't blocked to begin with.
So the DNS folks are like "Wouldn't it be great if we could just send large responses over UDP". And it's true. It would. And so it was invented, as part of EDNS0, a general extension mechanism for DNS.
With EDNS0 enabled, the client can include a little fake sort of record that says "hey, I support EDNS0, and also I'm good if you send me large UDP responses ... up to 4,000 bytes, say".
If a server also supports EDNS0, it can just send larger UDP responses, rather than truncating and falling back to TCP. We're now nearing the top of the rollercoaster.
What happens when we send a large UDP response? Well as we saw earlier the UDP and IP headers both have length fields. The way it works out is that fragmentation actually happens at the IP layer ...
Suppose we try to send a 4,000 byte UDP datagram over a 1500 byte MTU IP network, what happens is that it gets broken into 3 IP "fragments". The first contains the first part of the UDP message, including the UDP header itself, and the next packets follow on from there.
Each fragment will have the same IP "identification" header (IPID) and different fragment offsets. It's the kernel's job to wait for all of the fragments to show up and pass them on to an application.
Path MTU discovery works the same as with TCP. If the first fragment is too big, the sender will get an error. But since applications usually don't retry UDP responses, it can also just show up as that message being entirely dropped.
It also means that fragments of the message look strange to many parts of the network, like firewalls and switches. UDP packets without UDP headers! How are they supposed to know whether the packet is allowed or not? How is flow-switching supposed to work? The internet shrugs.
We often end up building in the ability to relate these packets to one another in many places, and since that's expensive, they also have to be rate-limited. It's deeply messy.
That's what's going on down there though, and understanding that full picture is key to diagnosing some common network mysteries. O.k. I have a team meeting now, more later!
Meeting postponed! o.k. so what happens when the DNS server sends a 4K response and the MTU is 1200 bytes? Well the DNS server gets the error, and it fixes the *next* response, but that first is lost. Super annoying. So the client has to retry, but then it works.
Some more fun: network paths don't have to be symmetrical, so MTUs don't either. If the client needs to send a large amount of data, this whole process happens for them too. A sender and a receiver can legitimately end up with different limits towards one another.
Because MTU discovery depends on state, and on ICMP messages being allowed, some folks do something like "MSS clamping" where for TCP they have the network actually meddle with the TCP connection (a little) to offer a different MSS to the other end.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.