Full Disclosure: I am NOT a security expert, I don’t even have a detailed understanding of the TLS protocol and x509 spec. This blog post is based only on my high level understanding of public key crypto.
TL;DR? Skip to the idea.
Today is a bad Day
Today sees the disclosure of what is probably one of the worst security
vulnerabilities in recent times. The repercussions are significantly worse than
goto bug. In short,
openssl, one of the most widely used SSL/TLS
libraries for online servers, has a bug which allows attackers to read arbitrary
memory locations from processes using the openssl library. These processes
always include the private key of the server. In short, what this means is,
among other things, Attackers can retrieve the private key of any server
running the vulnerable versions of
openssl, allowing for:
- Attackers to Man-In-The-Middle any visitor to your site / user of your services from this point onward.
- Decrypt any past sessions that were recorded and were not secured by perfect forward secrecy (of which there are a LOT). This will include sensitive information such as user passwords, confidential messages and files, basically anything the user sent or received from your servers.
In addition to collecting any other information in memory like webpage content and private user data without having to perform a MITM.
Worse still, an attack like this is almost completely undetectable, so you can’t tell whether your key has already been stolen, and this vulnerability has been present for the best part of 2 years. (More Details Here)
Am I affected?
You can check whether your servers are vulnerable by using this external service: http://filippo.io/Heartbleed. (Don’t forget to test any services that use SSL/TLS, not just HTTPS. E.G: IMAP, SMTP…)
But as an example, Facebook was only patched at around 12PM GMT (8/4/2012), 15 hours after the post hit HN. And many many other large websites have had a huge window in which they were vulnerable to attack.
How can we stop this!?
This is bad eh? In short, if you are affected, all you can do is this:
- Recompile openssl with
-DOPENSSL_NO_HEARTBEATSflag or install the latest version (
1.0.1g). If neither of these are an immediate option, and you can sacrifice https for a while (e.g. you are only running a blog), then you should disable ssl until you can get this fixed so your private keys are less likely to get stolen.
- Generate a new keypair for your server.
- Issue a revocation certificate and publish it to revocation lists.
This will help somewhat with preventing future Man-In-The-Middle attacks, but there is nothing you can do to stop attackers decrypting past messages.
Unfortunately… revoking certificates doesn’t really work… which means that you can’t really protect your users from MITM attacks either.
Revocation Doesn’t Work!?
Yes. It’s not scalable. SSL / TLS is a great system, it’s distributed, and very fast. Verification of certificate chains can all take place locally, and no ‘looking up’ is needed. Introducing certificate revocation takes this beautiful distributed system, and makes it centralised.
For certificate revocation to work correctly, browsers need to look up a site’s certificate in a CRL (Certificate Revocation List)… for EVERY connection. This is not scalable, so a lot of browsers cache validity of certificates for up to 6 months. Worse still, some don’t use revocation lists at all, they’re just too slow right?
Well what can we do?
I believe if you are relying on revocation infrastructure during a crisis like this, you’re doing PKI wrong…
This problem came about because of a missing bounds checking issue in the
source code for
openssl. This is not the first time an issue like this has
caused a serious security problem, and it is undoubtedly not going to be the
last. Writing these kinds of libraries in a higher-level language will not fix
the problem either. It may arguably make the problem somewhat less likely, but
will by no means make it impossible.
We can do better than we currently do, and protect against issues where our ssl libraries may be missing one bounds check and allow arbitrary access to our process’s memory.
How? Subkeys! and More Certificates!
I am not sure if the current SSL/TLS and x509 specs allow this (as I said, I am unfamiliar with both specs). However, subkeys are a brilliant high level concept, and one which is frequently used in PKIs such as OpenPGP.
- Generate a new keypair each day (or some other small interval).
- Create a subkey certificate (a certificate from your primary key signing the subkey, valid only for the current day). (A subkey cert in this instance would probably just be an x509 cert with the same Common Name (CN)).
- Only load the subkeys into the processes using openssl, so that if and when an attacker can gain access to the entire memory of this process, the worst that can happen is they can decrypt messages from that day, and MITM for the rest of the day.
- When a new subkey is required, due to expiry, ask a child process to generate a new one, and don’t generate it in the same process as that using openssl.
Allowing these kinds of certificates would also come with other benefits, such as having a single place to store the primary private key, which can generate the certificate and distribute the subkey among all load balancers, rather than having the private key stored on each of them individually.
This sort of behaviour should be the default for crypto libraries like
Going one step further, and allowing subkey certificates that restrict the
Common Name to a subset of the
CN from the primary certificate would great
too. We could then allow, for example, self-management of wildcard domains.
E.g: an organisation running
example.com, with a wildcard certificate for
example.com could sign their own subkey certificate
foo.example.com and give that to the people
running only that part of the organisation. These people could then set up
openssl to create subkeys for this new certificate on a daily basis.
Why this kind of delegation is not possible with x509 certificates boggles me, and if I am mistaken and it is possible, then why it is not currently in use everywhere boggles me even more!
We shouldn’t need to rely on a broken, unscalable revocation model. We should use time-sensitive subkeys instead, and not have primary keys stored in a front-facing process’s memory. Beter yet, not even on the same machine.
I’m currently studying Computer Science at Oxford University in the UK, and am a past president of the Oxford University Computer Society. I have a passion for technology, in particular ubiquitous computing, communication and security.
- 2014/04/08 – 5:20pm GMT: Added clarification that the processes affected are those using the openssl libraries and not “openssl processes”.