o.k. so DNS, the so-called "phonebook of the internet" (if you look the other way and ignore that that's a better metaphor for Google) ... ANYWAY ... DNS runs over the User Datagram Protocol (UDP).
-
-
Show this thread
-
If you know anything about UDP it's that it's a "Fire and Forget" protocol, it can be lossy. From the perspective of an application, you send a packet, and it gets to the other end or it doesn't. If you want reliability, you have to retry yourself or have some kind of fallback.
Show this thread -
UDP is a "layer 4" protocol. It runs on top of the Internet Protocol .. which is a "layer 3" protocol. Scarequotes because the OSI layering model is insanely wrong and confusing in modern contexts.
Show this thread -
Here's what an IP header looks like. Study it. There will be an exam.pic.twitter.com/Ozzri8uvoe
Show this thread -
-
O.k. so back to DNS for a bit. So DNS is a request-response protocol. Requests are like "What's the IP address for http://www.amazon.com " and responses are like "here's the IP addresses for http://www.amazon.com ".
Show this thread -
IP-based networks have limits for how much data they can handle in a single packet. Now if you study the IP and UDP headers, you'll that they both have "length" fields, and these fields are 16-bits. So you'd think that we should be able to send up to 65535 bytes in a packet.
Show this thread -
That would be *way* too simple. Historically, the size of the packets we could send has actually limited by the underlying technology. Ethernet is often limited to 1500 bytes per packet, or up to 9,000 if you enable a special mode called "Jumboframes".
Show this thread -
Satellites, Radio systems, things like POS (Packet over Sonet), ATM, DSL, Token Ring, and more. These could all have different MTUs that actually limit how much data you could send or receive in one packet.
Show this thread -
The underlying reasons are usually esoteric, like how long you're allowed to send signals for at the electrical level. Anyway, main point: different mediums = different MTUs. MTU stands for "Maximum Transmission Unit".
Show this thread -
Different networks can be linked though, and you can only get a packet through all of the links if it's <= the lowest MTU of any of the links, so you need a way to discover the whole paths MTU. We'll come back to this, I promise. Placeholder for now.
Show this thread -
There's also a minimum MTU that *any* IP network has to support. It's 576 bytes, everything has to handle that. So DNS traditionally takes a shortcut around path MTU discovery and just says "lets use UDP and just stick to the minimum".
Show this thread -
So it says "Requests and responses have to be be smaller than 512 bytes", which when you add the 20 byte UDP header, and the 20 byte IP header, leaves 24 bytes of space for IP options. So it all fits.
Show this thread -
Fitting a DNS request into 512 bytes isn't too hard, it's why DNS has a limit on how big domain names can be. But since humans have to type them sometimes, long ones would always be a pain. No big deal.
Show this thread -
Fitting an entire DNS response into 512 bytes was easy to start out with .. just a few IPs, but it's gotten harder over time. Email, Anti-Spam measures, IPv6, and DNSSEC (which is garbage, but that's a different topic) have all pushed the size of responses up and up.
Show this thread -
DNS has a built in mechanism to handle this, called truncation. If the response that a server needs to send is too big, it sends as much as it can and then marks a bit that means "This response was truncated".
Show this thread -
The requester then retries the request over TCP, instead of UDP. Because TCP is intended for "bigger" messages. Two problems with this: sometimes people block TCP DNS without realizing, and TCP *still* has to figure out what the path MTU is.
Show this thread -
So here's how path MTU discovery works for TCP. The TCP connection starts with its best guess of what the MTU is, and calculates a "Maximum Segment Size" from this. This isn't the TCP window. It's the maximum data that TCP will put in a single packet.
Show this thread -
The client sends SYN, the server sends SYN|ACK, usual TCP handshake. These packets are small and rarely trigger path MTU discovery. For DNS; the request packet will be small too and likely won't trigger anything.
Show this thread -
But when the server sends the response, that will be big, maybe too big. But it sends it out anyway. That packet gets as far as it gets. If it gets to a link that has a smaller MTU, that link sends back an Internet Control Messsage Protocol (ICMP) message that says "MTU exceeded"
Show this thread -
It sends to this back to the sender ... the DNS server in this case. And so the sender has a clue what even triggered the error, it includes the top part of the packet that triggered the error.
Show this thread -
Basically the server tried to send a long letter by relay mail, and got to a relay that only has small envelopes. So that relay tore the original letter in half and send back that half in a small envelope saying "I only have small envelopes, so try again, good luck".
Show this thread -
O.k. so now the sender knows a new lower MTU for that path, and it caches it. The kernel keeps a cache of all MTUs it knows for all destinations. Now it recomputes a new TCP MSS and resends the data but in smaller packets. Yay! This is part of why TCP is reliable.
Show this thread -
This whole process takes a damn while though; the client had to fall back to TCP, and then the path mtu discovery had to happen, and finally the client gets the response ... as long as TCP wasn't blocked to begin with.
Show this thread -
So the DNS folks are like "Wouldn't it be great if we could just send large responses over UDP". And it's true. It would. And so it was invented, as part of EDNS0, a general extension mechanism for DNS.
Show this thread -
With EDNS0 enabled, the client can include a little fake sort of record that says "hey, I support EDNS0, and also I'm good if you send me large UDP responses ... up to 4,000 bytes, say".
Show this thread -
If a server also supports EDNS0, it can just send larger UDP responses, rather than truncating and falling back to TCP. We're now nearing the top of the rollercoaster.
Show this thread -
What happens when we send a large UDP response? Well as we saw earlier the UDP and IP headers both have length fields. The way it works out is that fragmentation actually happens at the IP layer ...
Show this thread -
Suppose we try to send a 4,000 byte UDP datagram over a 1500 byte MTU IP network, what happens is that it gets broken into 3 IP "fragments". The first contains the first part of the UDP message, including the UDP header itself, and the next packets follow on from there.
Show this thread -
Each fragment will have the same IP "identification" header (IPID) and different fragment offsets. It's the kernel's job to wait for all of the fragments to show up and pass them on to an application.
Show this thread - 8 more replies
New conversation -
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.