At IT Freedom, we hear about most software exploits through normal vendor channels or various governmental Computer Emergency Response Teams (CERTs). The announcements are usually in the form, “There’s a bug in our software, and here’s the fix.”
Zero-day exploits are a different matter, though. These are exploits in production software that are previously unknown to the vendor, so they have had no time to fix the vulnerable code. The discovery and response to zero-days is often frantic, disorganized, hysterical, and flat-out wrong.
A good example is a security issue discovered just a few weeks ago in the network time protocol daemon (ntpd), which runs on many common server platforms to keep them in sync with universal time. The hole supposedly allowed anyone to easily gain root-level access to a server through ntpd from the Internet.
Here’s how it played out with us:
Discovery of a Zero-Day Exploit
On Friday, December 19 at about 5:30pm, I got an IM from Brian Camp, our Director of Technology: “Hey, not sure if you heard, but there's a new ntpd security vulnerability. Very uncoordinated announcement with only a source code patch available on a site that is now down. No patches from vendors yet.”
I replied, “No, hadn't heard. How did you hear about it? A CERT advisory?”
Brian: “No, just another mailing list I’m on. This advisory is a mess. Where I saw it was actually someone just saying how great it was that OpenBSD's implementation wasn't affected. We still haven't got a notice from CERT or any of the many vendor lists we’re on despite many of those vendors being affected.”
Initially, we had very little concrete information to go on. How serious was the vulnerability? How easily exploited? Which vendor implementations were affected? The CERT Knowledge base listed several buffer overflow vulnerabilities, but was pretty vague on the conditions required to exploit them other than it being remote.
Meanwhile, the news organizations, in their haste to scoop the incident, were consistent in their stellar job of spreading confusion and alarm. Several articles appeared, e.g., "Serious NTP security holes have appeared here..." and "Exploits Circulating for Remote Code Execution...", from various news sites all specifically claiming that exploits were in the wild and were being used to compromise production servers. The articles have no references to back up that claim, but that’s par for the course.
Meanwhile, Red Hat (Linux operating system vendor) posted an article about CVE-2014-9295 that said that all of the ntpd exploits either required non-default crypto configurations, local access, or prior authentication, which would seem to imply that the exploit was not as severe as first thought.
Of note, there was no mention anywhere of whether it was just NTP servers affected or NTP clients as well.
Given this limited, confusing, and conflicting information, Brian decided to play it safe and shut down our Internet-facing ntpd service. Another message from Brian: “I switched our public NTP server over to OpenNTPD instead of the standard ntp.org implementation, so we're good there. I'll push updates to our customer servers, who all use the NTP client, after Red Hat/Centos have pushed updates out.”
The Waiting Game
We then waited for more information. Brian emailed me Saturday morning:
Zero day :
I haven't seen any reliable evidence that the NTP issue is a zero-day situation. The NTP project support site that lists information on the vulnerability is remarkably still down, but the page is in Google's cache. The short of it is, "Google reported the issue and you can get a patch here". There is no mention of what conditions are required to exploit the issues or if it was discovered because it was being exploited. There are those articles [two previous news articles referenced above] that still say 'exploits are circulating', and now the CERT advisory says the same, but those purported exploits aren't being posted anywhere that I've looked. Security researchers aren't posting about it either; nobody is bragging yet about getting a working exploit, etc.
At this point, my best guess is that the NTP project heard about it from Google, who probably found the issue during a code review, not while examining an exploited system. The NTP project then did a terrible job of handling the issue by posting info about it to their website and mailing list with minimal, if any, advanced notice to the major vendors. That website went down due to excessive load (or possibly a denial-of-service attack), and the news organizations filled the information void with their usual self-perpetuating click-bait headlines and hysteria. There could very well be exploits floating around, but if so, they aren't as public as what is being claimed, and the news organizations are just making up something that they happen to be right about.
Last night, I installed OpenNTPD (an alternate ntpd implementation) on our public NTP server and removed the ntp.org ntpd. That should permanently fix the issue for that server, unless we end up running into problems with OpenNTPD. OpenNTPD has been around for about ten years and I've personally used it without issue for most of that time, but its primarily an OpenBSD program and not a Linux one.
Late last night, Red Hat and the CentOS project released new binary NTP packages, which corrected the vulnerabilities. I wrote a playbook (script) for Ansible for applying the updates and then used it to push them to all of our customer servers running the NTP client. The script took care of the usual checking for updates, downloading them, applying them, and restarting the ntpd daemon as necessary.
No further details of the exploitability of the issue have been released. Red Hat hasn't updated their advisory that asserts it’s only exploitable in non-standard configurations. News sites and Twitter are full of “patch now, this is being exploited!” type stuff, but after spending an hour or so looking this morning, I couldn't find any more information than what was present in the minimal advisories last night. Most actually (eventually) link back to the original two rushed and almost certainly inaccurate articles from last night. The modern security meme.
As mentioned above, Linux vendors fairly quickly released fixes to ntpd. Apple took a couple more days to release a security update to OS X, which for the first time ever used an Apple used an automatic deployment mechanism that they had introduced in OS X a couple years prior. Microsoft Windows implementations of the NTP protocol were unaffected.
As it turned out, this vulnerability almost certainly was not a zero-day exploit. The supposedly easy root access turned out to be all media hype, as described above. As of this writing, no exploits have been published, and Red Hat still maintains that the issue requires unusual circumstances to exploit.
At the time, though, since we didn’t know the facts, we had to play it safe and treat it as a zero-day. We believe that Google engineers discovered and reported it, but if so, why did they not give vendors a heads-up first, so that they could distribute fixes when it was announced? Or did somebody leak the news?
This incident also highlights the really sorry state of a lot of very critical, widely-used Internet software. If you’re concerned about our decaying transportation infrastructure, you should be even more concerned about our aging Internet infrastructure. Brian again:
Why I already hated the ntp.org NTP implementation:
I implied it in my message earlier, but just to state it outright: The NTP project's NTP implementation is really bad. I was thinking earlier Friday (sort of deja vu when this issue came out) about putting something on the Twiki about several aspects of modern Linux that just don't work right and NTP was one of them. It’s an old, extremely complicated piece of software focused on timekeeping on a very large scale. It has very, very poor defaults and cannot be relied on to automatically keep time (its sole function) on some types of machines, namely, virtual machines, hosts that reboot frequently, and hosts with intermittent network connectivity. On those machines, you'll frequently run into problems after time gets screwed up, due to an external cause (VM was paused, host rebooted, whatever), and time will be broken until you manually do something, no matter how well you've tuned the NTP config. I think, and hope, that this latest issue will be the nail in the coffin for the NTP project's NTP implementation being the standard Linux client.
This incident also exemplifies how much art, skill, and dedication are required to do IT really well. It isn’t simply a matter of checking a box that says, “Install updates automatically.” You have to keep your systems as up-to-date and as secure as possible, of course, but you also have to keep a constant lookout for the unexpected, not ignore the slightest hint of an issue, and have the processes, procedures, tools, and people in place to respond quickly when something bad happens.