tmp2000 · February 28, 2018 16:07 · Dec 1, 2015 · Dec 1, 2015
diff --git a/ocsp-stapling.md b/ocsp-stapling.md
@@ -114,4 +114,15 @@ If this seems like an unfairly long list, the reality is that virtually all
 of this is supported by Microsoft IIS services today. The Microsoft
 documentation is a bit spread out, but [this](https://technet.microsoft.com/en-us/library/ee619754(v=ws.10).aspx)
 is good for starters, and [this](https://technet.microsoft.com/en-us/library/ee619723(v=ws.10).aspx) is good
-for further reading.
+for further reading.
+
+Given this long list of things, which do seem somewhat 'basic', it seems a shame
+to require every TLS server to reimplement this. This seems ideal to have as
+a common, stand-alone daemon/service, which can then interface with a variety
+of TLS servers (IMAP, SMTP, HTTP, FTP, XMPP, etc).
+
+Perhaps the most basic interface for this is simply dropping the OCSP response
+to a well-known path pre-agreed with the server. The server can monitor for
+changes to this file. When changes are noticed, it can start serving the
+new response. While some logic (such as shutting down the service) may be more
+complicated, that at least starts with some basic functionality.
diff --git a/ocsp-stapling.md b/ocsp-stapling.md
@@ -0,0 +1,117 @@
+On Twitter [the other day](https://twitter.com/sleevi_/status/669562330405978112),
+I was lamenting the state of OCSP stapling support on Linux servers, and got
+asked by several people to write-up what I think the requirements are for OCSP
+stapling support.
+
+1.  Support for keeping a long-lived (disk) cache of OCSP responses.
+
+    This should be fairly simple. Any restarting of the service shouldn't
+blow away previous responses that were obtained. This doesn't need to be
+disk, just stable - and disk is an easy stable storage for most server
+operators.
+
+1.  Validate the server responses to make sure it is something the client
+will accept.
+
+    There's a number of ways to botch this on the server, and sadly, a number
+of ways in which CAs can botch their response generators. The most immediate
+and obvious issues are situations where you have a 'revoked' response, or when
+you receive an OCSP 'tryLater' or 'internalError' response. However, there's
+also more subtle issues, like making sure the OCSP Response as actually
+well-formed (sometimes uploads to CDNs are botched), is time valid for the
+current time (sometimes the CDNs server stale files), is for the certificate
+requested (yes, sadly, really), and any sort of PKI-related errors (for
+example, the delegated OCSP signer's certificate being expired).
+
+1.  Refreshes the response, in the background, with sufficient time before
+expiration.
+
+    A rule of thumb would be to fetch at notBefore + (notAfter - notBefore) / 2,
+which is saying "start fetching halfway through the validity period". You want
+to be able to handle situations like the OCSP responder giving you junk, but
+also sufficient time to raise an alert if something has gone really wrong.
+
+    What you do NOT want to do is start OCSP fetching the first time you need
+it, or waiting until the response is fully expired - that creates really
+terrible experiences all around, and makes your CA an even bigger point of
+failure.
+
+1.  That said, even with background refreshing, such a system should observe
+the [Lightweight OCSP Profile of RFC 5019](https://tools.ietf.org/html/rfc5019).
+
+    This more or less boils down to "Use `GET` requests whenever possible, and
+observe HTTP cache semantics." Given how complicated the cache semantics can
+be to get right in a client, this can be surprisingly hard to implement
+correctly.
+
+1.  As with any system doing background requests on a remote server, don't be
+a jerk and hammer the server when things are bad.
+
+    The Internet is a strange and wonderful place, and sometimes servers and
+networks have issues. When a server supporting OCSP stapling has trouble
+getting a request, hopefully it does something smarter than just retry in a
+busy loop, hammering the OCSP server into further oblivion. This may seem
+implied by the previous two remarks, but it's worth spelling out.
+
+1.  Distributed or proxiable fetching
+
+    From talking with server operators, a variety of situations are brought up
+as challenges for OCSP stapling. One common bucket is the problem of front-end
+and back-end splits - there may be thousands of FE servers, all with the same
+certificate, all needing to staple an OCSP response. You don't want to have
+all of them hammering the OCSP server - ideally, you'd have one request, in
+the backend, and updating them all.
+
+    A variation of this problem is FEs that aren't actually allowed to
+initiate outbound connections. Sometimes it's required that the FE talk to a
+proxy server, sometimes it's just outright blocked - so a system should be
+robust in handling that distribution.
+
+    This may not be a problem for the OCSP daemon to solve - it could be that
+the matter is just treated as a general configuration management/distribution
+problem - but at least it should be clear to those deploying the config what
+the tradeoffs are. For example, is it possible for the config distribution
+system to mangle responses? Should FEs still check the validity of incoming
+responses?
+
+1.  The ability to serve old responses while fetching new responses.
+
+    That is, it shouldn't be mutually exclusive - it's not that there is the
+'ONE TRUE RESPONSE' - some flexibility for multiple responses is needed.
+
+1.  Some idea of what to do when "things go bad".
+
+    What happens when it's been 7 days, no new OCSP response can be obtained,
+and the current response is about to expire? Do you:
+    1. Stop the (web/email/ftp/xmpp) service?
+    1. Stop serving stapled OCSP responses?
+
+    Especially in a world where Must-Staple becomes more prevalent, what
+should the action be taken when things go awful? If it's a Must-Staple cert,
+it might be more beneficial to fully stop the service (thus causing monitoring
+to really flip out) rather than serve bad responses or no response, both of
+which may result in even worse user experiences.
+
+1.  Configurable OCSP responder per-certificate-being-checked.
+
+    The CA/Browser Forum's Baseline Requirements allows CAs to omit the
+`authorityInfoAccess` extension for situations where the subscriber has agreed
+to staple. This agreement can be done via contractual means or technical means,
+which is to say that it's not predicated on the Must-Staple extension in the
+certificate. The reason for this omission is to allow for smaller certificates,
+which offsets (a very small amount) of the size increase of the OCSP response.
+
+    For these certificates, the server operator will need to configure what
+the OCSP responder URL is for that certificate.
+
+1.  Staple by default.
+
+    If you can get all the above worked out, with sane behaviours, there is
+very little reason that OCSP stapling shouldn't be on by default. Make it
+happen!
+
+If this seems like an unfairly long list, the reality is that virtually all
+of this is supported by Microsoft IIS services today. The Microsoft
+documentation is a bit spread out, but [this](https://technet.microsoft.com/en-us/library/ee619754(v=ws.10).aspx)
+is good for starters, and [this](https://technet.microsoft.com/en-us/library/ee619723(v=ws.10).aspx) is good
+for further reading.
No results found