Sunday, March 3, 2013

An Operational Reason for Knowing Trivia

I've been largely out of touch with the IT certification scene lately, but I'm sure that people are still complaining incessantly about the fact that they need to memorize "trivia" in order to pass certification tests. Back when I was teaching Cisco classes full-time, my certification-oriented students were particularly bitter about this. Of course, this is a legitimate debate and the definition of "trivia" varies from person to person.

When I saw this article about CloudFlare's world-wide router meltdown, however, I immediately felt a bit smug about all those hours spent learning and teaching about packet-level trivia. If you don't want to read the article, here's the tl;dr:

  • their automated DDoS detection tool detected an attack against a customer using packets sized in the 99,000 byte range.
  • their ops staff pushed rules to their routers to drop those packets
  • their routers crashed and burned
So at this point you should be saying what some of the commenters did: huh? IP packets have a maximum size of 65,536 bytes, because the length field is 16 bits long.

In order for this meltdown to happen, they had to have a compounded series of errors:

  • the attack detection tool was coded to allow detection of packet sizes that can't actually occur: no bounds checking.
  • the ops staff didn't retain the "trivia" that they learned in Networking 101, and thus couldn't see the problem with the output generated by the detection tool.
  • the router OS didn't do input validation, and blew up when attempting to configure itself to do something crazy.
There's lots of blame to go around here, and my intent isn't to add to that; rather, my point is to tell everybody who dutifully memorized and retained stuff like the maximum IP packet size to feel good about yourself for a few minutes! And next time you write networking code: do input validation and bounds checking.

No comments: