The Mystery of OpenSSL (Ephemeral Keys)

…or why you should think first before mindlessly disabling OpenSSL ciphers (scroll to the bottom of the page for the relevant Wu-Tang song)

On FIPS and puppet-agent packages

At Puppet, we provide puppet-agent AIO (batteries-included) packages for a variety of platforms, among them also RedHat 7 FIPS. FIPS stands for Federal Information Processing Standards, and refers to the standards and guidelines developed by NIST for use in computer systems by non-military American government agencies and government contractors. They attempt to standardize a wide number of topics, but the ones which impact us the most are the cryptography requirements (FIPS 140). They are enforced by disabling non-secure ciphers and short key lengths, weak algorithms like MD5 and so forth. Attempting to generate a MD5 hash on a FIPS-compliant machine should greet you with an error:

→ openssl md5 <<<i_should_not_use_this_for_hashing_passwords
Error setting digest
140397448918848:error:060800C8:digital envelope routines:EVP_DigestInit_ex:disabled for FIPS:crypto/evp/digest.c:135:

Since our packages target different use-cases than let’s say—Linux distro packages of Puppet (think air-gapped environments that require strict control over what gets installed from where), this essentially means that we have to vendor all of Puppet’s third-party requirements and dependencies. It can be argued that this is not the prettiest approach, but this way we have control over the versions of the dependencies that Puppet uses. For example, the version of curl shipped in Debian 10 is 7.64 (February 2019), whereas we are shipping 7.77 (May 2021). Of course, while Debian does not update their curl version throughout the lifetime of a Debian version, they handle vulnerabilities by applying patches on top of the older versions.

Now, to the matter at hand, OpenSSL. Another dependency that we vendor, on which most of the other software that we ship depends (curl, Ruby, our in-house C++ components like Facter 3 and the PXP Agent). It’s also the only package that needs special FIPS attention, as it handles cryptography for every other package that we ship.

RedHat 7 FIPS ships with OpenSSL 1.0.2k. The issue with OpenSSL 1.0.2 is that it reached EOL back in 2019. This meant that there would be no more security fixes (at least not public) for that stream, and that we had to find a way to update to 1.1.1 which is still supported until 2023. To accomplish this we ended up creating Frankenstein’s monster: we get the OpenSSL 1.1.1 source RPM for CentOS (earlier 8, now Stream 😤, either way NOT compatible out-of-the-box with RedHat 7), give it some FIPS armor and make it build on RedHat 7. The recipe for this is something best to be left unspoken, but as a hint: it involves making patches that patch patches. Word goes around that many people have seen the horrors of this and have gone insane.

I should also mention that the source RPMs from CentOS do not always contain the most recent available versions of OpenSSL. In our case we had to update to 1.1.1l, but the latest SRPM was 1.1.1k. With this in mind, the best we could do was to update to the latest available SRPM, and apply patches for the high-vulnerability CVEs from upstream.

Now that I finished dumping all that exposition (this happens all the time when I try to explain stuff from work - relevant gif), I can finally start describing what happened on the sunny day on which we decided to bump our OpenSSL FIPS to a newer version.

Bumping OpenSSL FIPS

The task came with some uncertainties, as the version prior to upgrading was 1.1.1c, which was released in March 2020, so bumping to 1.1.1k meant integrating over a year’s worth of changes which could impact our other components in unexpected ways.

Fortunately, it seemed that things were going well, after some fights with patches that failed to apply cleanly we got OpenSSL to build, and the rest of the agent seemed to build like before. Then tests started to fail. All of our PXP Agent test suite errored with failed TLS handshake messages. It quickly became clear that this had to be because of our OpenSSL shenanigans, but what actually happened?

It seemed that with the new OpenSSL, pxp-agent was unable to negotiate a cipher with the pcp-broker, to which it needed to connect to. To get rid of additional complexity, I started thinking of the easiest reproduction case which didn’t involve the pxp-agent at all, just trying to connect to the pcp-broker (which ran on port 8142) via the OpenSSL CLI:

→ openssl s_client -showcerts cryptic-spoon.delivery.puppetlabs.net:8142
[...]
140684005754688:error:141A3066:SSL routines:tls_process_ske_dhe:bad dh value:ssl/statem/statem_clnt.c:2136:
[...]
---
SSL handshake has read 2573 bytes and written 359 bytes
Verification error: unable to verify the first certificate
---
New, (NONE), Cipher is (NONE)
Server public key is 4096 bit
Secure Renegotiation IS supported
Compression: NONE
Expansion: NONE
No ALPN negotiated
SSL-Session:
    Protocol  : TLSv1.2
    Cipher    : 0000
    Session-ID: 1D5CFE972020C174197C6C83B3FE23B152980FC0C8FF2382E0257F74A4712EFC
    Session-ID-ctx: 
    Master-Key: 
    PSK identity: None
    PSK identity hint: None
    SRP username: None
    Start Time: 1631298865
    Timeout   : 7200 (sec)
    Verify return code: 21 (unable to verify the first certificate)
    Extended master secret: yes
---

A couple of things jumped out, the bad dh value error, and the fact that there was no sign of a TLS cipher (Cipher is (NONE)). Let’s compare this with output from the previous OpenSSL:

--- openssl-1.1.1c	2021-09-10 18:46:21.594951235 +0000
+++ openssl-1.1.1k	2021-09-10 18:46:00.507948966 +0000
@@ -4,6 +4,9 @@
 depth=0 CN = cryptic-spoon.delivery.puppetlabs.net
 verify error:num=21:unable to verify the first certificate
 verify return:1
+depth=0 CN = cryptic-spoon.delivery.puppetlabs.net
+verify return:1
+140453760333632:error:141A3066:SSL routines:tls_process_ske_dhe:bad dh value:ssl/statem/statem_clnt.c:2136:
 CONNECTED(00000004)
 ---
 Certificate chain
@@ -50,20 +53,12 @@
 issuer=CN = Puppet CA: cryptic-spoon.delivery.puppetlabs.net
 
 ---
-Acceptable client certificate CA names
-CN = Puppet CA: cryptic-spoon.delivery.puppetlabs.net
-CN = Puppet Root CA: e5a215fc1589ce
-Client Certificate Types: ECDSA sign, RSA sign, DSA sign
-Requested Signature Algorithms: ECDSA+SHA256:ECDSA+SHA384:ECDSA+SHA512:RSA-PSS+SHA256:RSA-PSS+SHA384:RSA-PSS+SHA512:RSA-PSS+SHA256:RSA-PSS+SHA384:RSA-PSS+SHA512:RSA+SHA256:RSA+SHA384:RSA+SHA512:DSA+SHA256:ECDSA+SHA224:RSA+SHA224:DSA+SHA224:ECDSA+SHA1:RSA+SHA1:DSA+SHA1
-Shared Requested Signature Algorithms: ECDSA+SHA256:ECDSA+SHA384:ECDSA+SHA512:RSA-PSS+SHA256:RSA-PSS+SHA384:RSA-PSS+SHA512:RSA-PSS+SHA256:RSA-PSS+SHA384:RSA-PSS+SHA512:RSA+SHA256:RSA+SHA384:RSA+SHA512:DSA+SHA256:ECDSA+SHA224:RSA+SHA224:DSA+SHA224:ECDSA+SHA1:RSA+SHA1:DSA+SHA1
-Peer signing digest: SHA256
-Peer signature type: RSA-PSS
-Server Temp Key: DH, 1024 bits
+No client certificate CA names sent
 ---
-SSL handshake has read 2624 bytes and written 564 bytes
+SSL handshake has read 2573 bytes and written 359 bytes
 Verification error: unable to verify the first certificate
 ---
-New, TLSv1.2, Cipher is DHE-RSA-AES128-GCM-SHA256
+New, (NONE), Cipher is (NONE)
 Server public key is 4096 bit
 Secure Renegotiation IS supported
 Compression: NONE
@@ -71,14 +66,14 @@
 No ALPN negotiated
 SSL-Session:
     Protocol  : TLSv1.2
-    Cipher    : DHE-RSA-AES128-GCM-SHA256
-    Session-ID: 9ABB496A3725A673D801AD5BA249CEE19AFF3E204C2BFF15D8B02CD479F74F04
+    Cipher    : 0000
+    Session-ID: 06DEA8591C656BFA037818DDACC9D9DF24F3561B008D1E33FE48FDEF90EEDE2A
     Session-ID-ctx: 
-    Master-Key: 83F6313291B0E431A304A19D9928B7EF2DD4F084CF614870CA2D45A1180F978DBA7B0C79B8077DA2C2459923D4FD168F
+    Master-Key: 
     PSK identity: None
     PSK identity hint: None
     SRP username: None
-    Start Time: 1631299581
+    Start Time: 1631299560
     Timeout   : 7200 (sec)
     Verify return code: 21 (unable to verify the first certificate)
     Extended master secret: yes

Clearly, something went wrong. My new assumption was that the DHE-RSA-AES128-GCM-SHA256 cipher which worked before did no longer work. I confirmed this by calling showcerts again with the broken OpenSSL and explicitly passing the cipher with -cipher DHE-RSA-AES128-GCM-SHA256. I was greeted with the same error. I devised a script that cycled through all supported ciphers and tried to connect with each one of them:

# 0 means the connection succeeded, 1 indicates a failurefor cipher in $(openssl ciphers -s | tr ':' '\n' | sort); do echo Q | openssl s_client -cipher $cipher -connect cryptic-spoon.delivery.puppetlabs.net:8142 &>/dev/null; ec=$?; echo $cipher : $ec; done
AES128-GCM-SHA256 : 1
AES128-SHA : 1
AES128-SHA256 : 1
AES256-GCM-SHA384 : 1
AES256-SHA : 1
AES256-SHA256 : 1
DHE-RSA-AES128-GCM-SHA256 : 1
DHE-RSA-AES128-SHA : 1
DHE-RSA-AES128-SHA256 : 1
DHE-RSA-AES256-GCM-SHA384 : 1
DHE-RSA-AES256-SHA : 1
DHE-RSA-AES256-SHA256 : 1
ECDHE-ECDSA-AES128-GCM-SHA256 : 1
ECDHE-ECDSA-AES128-SHA : 1
ECDHE-ECDSA-AES128-SHA256 : 1
ECDHE-ECDSA-AES256-GCM-SHA384 : 1
ECDHE-ECDSA-AES256-SHA : 1
ECDHE-ECDSA-AES256-SHA384 : 1
ECDHE-RSA-AES128-GCM-SHA256 : 0
ECDHE-RSA-AES128-SHA : 1
ECDHE-RSA-AES128-SHA256 : 1
ECDHE-RSA-AES256-GCM-SHA384 : 0
ECDHE-RSA-AES256-SHA : 1
ECDHE-RSA-AES256-SHA384 : 1
TLS_AES_128_GCM_SHA256 : 1
TLS_AES_256_GCM_SHA384 : 1
TLS_CHACHA20_POLY1305_SHA256 : 1

On one hand, there are 2 ciphers that seemed to work. On the other hand… there are only 2 ciphers that seemed to work. Not knowing how to handle this, I did a dirty fix by disabling all DHE ciphers which the server seemed to prefer. After this, the connection defaulted to ECDHE-RSA-AES128-GCM-SHA256 which appeared to work. Disabling additional ciphers on an already hardened FIPS machine sounds problematic, so I powered up a RedHat 8 VM, enabled FIPS and checked the situation from there as well. It looked like the system-provided OpenSSL from RedHat 8 behaved the same—this made me more confident in disabling the ciphers. I opened a PR, tests started passing again, but I still wasn’t convinced.

Inspecting packets

I didn’t nuke my test environment yet, so I decided to continue sleuthing some more. The pcp-broker machine also had a puppetserver instance running on port 8140, so I tried connecting to that as well (especially since the core Puppet test suite passed with the newer OpenSSL, so I knew this had to work).

Lo and behold, the connection to puppetserver succeeded, with the exact same cipher that obviously failed when connecting to pcp-broker:

→ openssl s_client -showcerts cryptic-spoon.delivery.puppetlabs.net:8140
[...]
New, TLSv1.2, Cipher is DHE-RSA-AES128-GCM-SHA256
[...]

I installed tshark, as Wireshark has a GUI and I didn’t feel like setting up a desktop environment/X server on the RedHat 7 FIPS test machine. After numerous unsuccessful attempts of bashing in random CLI options with the hopes of getting human-readable output I found the correct incantation:

tshark -nn -i ens33 -s 0 host cryptic-spoon.delivery.puppetlabs.net and port 8142 -o http.ssl.port:8142 -V -x

I’m pretty sure most of the options don’t do anything or are straight-up wrong, but after getting correct output to show I decided not to mess with it anymore.

Using the broken OpenSSL I captured handshake packets with both puppetserver and pcp-broker, and started comparing them.

As expected, both services negotiated the same TLS cipher:

# this stands for DHE-RSA-AES128-GCM-SHA256
Cipher Suite: TLS_DHE_RSA_WITH_AES_128_GCM_SHA256 (0x009e)

However, there was a difference in the server key exchange parameters, the Diffie-Hellman temporary public key was longer in the puppetserver handshake (128 vs 256 bytes). I quickly correlated this with the bad dh value error I saw initially:

--- puppetserver	2021-09-10 22:40:01.959580598 +0300
+++ pcp-broker  	2021-09-10 22:40:01.959580598 +0300
@@ -872,19 +872,19 @@
                     encrypted: 33efa9834b7eb5d6544f3d59d0c6a402079fbb5255315f01...
         Handshake Protocol: Server Key Exchange
             Handshake Type: Server Key Exchange (12)
-            Length: 1035
+            Length: 779
             Diffie-Hellman Server Params
-                p Length: 256
-                p: ffffffffffffffffadf85458a2bb4a9aafdc5620273d3cf1...
+                p Length: 128
+                p: ffffffffffffffffc90fdaa22168c234c4c6628b80dc1cd1...
                 g Length: 1
                 g: 02
-                Pubkey Length: 256
-                pubkey: cc6a68cac65c01d3898bb156bb85743c5db4ccf4ee915d06...
+                Pubkey Length: 128
+                pubkey: 87d0a590ee8df3061f6a4d0aefa660f6e9abacf8c2835246...
                 Signature Hash Algorithm: 0x0804
                     Signature Hash Algorithm Hash: Unknown (8)
                     Signature Hash Algorithm Signature: Unknown (4)
                 Signature Length: 512
-                signature: 83ae947fd7dbbd7483c52b2b285b0bf7d4894309e0a59424...
+                signature: 5fb34db22a892338d77936d7896478aac42f999dc0cb8626...
         Handshake Protocol: Certificate Request
             Handshake Type: Certificate Request (13)
             Length: 154

Clearly this has to be something configurable on the server-side. The pcp-broker is a Clojure application running in a JVM, and after a bit of searching I found the JVM flag to increase the length of the Diffie-Hellman ephemeral pubkey: -Djdk.tls.ephemeralDHKeySize=2048.

This had to go in the JVM_OPTS and LEIN_JVM_OPTS environment variables. I exported them and restarted the broker, then retried the connection:

→ openssl s_client -showcerts cryptic-spoon.delivery.puppetlabs.net:8142
CONNECTED(00000004)
[...]
---
Acceptable client certificate CA names
CN = Puppet CA: cryptic-spoon.delivery.puppetlabs.net
CN = Puppet Root CA: e5a215fc1589ce
Client Certificate Types: ECDSA sign, RSA sign, DSA sign
Requested Signature Algorithms: ECDSA+SHA256:ECDSA+SHA384:ECDSA+SHA512:RSA-PSS+SHA256:RSA-PSS+SHA384:RSA-PSS+SHA512:RSA-PSS+SHA256:RSA-PSS+SHA384:RSA-PSS+SHA512:RSA+SHA256:RSA+SHA384:RSA+SHA512:DSA+SHA256:ECDSA+SHA224:RSA+SHA224:DSA+SHA224:ECDSA+SHA1:RSA+SHA1:DSA+SHA1
Shared Requested Signature Algorithms: ECDSA+SHA256:ECDSA+SHA384:ECDSA+SHA512:RSA-PSS+SHA256:RSA-PSS+SHA384:RSA-PSS+SHA512:RSA-PSS+SHA256:RSA-PSS+SHA384:RSA-PSS+SHA512:RSA+SHA256:RSA+SHA384:RSA+SHA512:DSA+SHA256:ECDSA+SHA224:RSA+SHA224:DSA+SHA224
Peer signing digest: SHA256
Peer signature type: RSA-PSS
Server Temp Key: DH, 2048 bits
---
SSL handshake has read 2880 bytes and written 682 bytes
Verification error: unable to verify the first certificate
---
New, TLSv1.2, Cipher is DHE-RSA-AES128-GCM-SHA256
Server public key is 4096 bit
Secure Renegotiation IS supported
[...]

It worked! After this we ended up reverting the cipher removal and doing the fix in the proper location.

This was a difficult issue to track down, mainly because the openssl CLI doesn’t show much info if the TLS handshake fails. Note how on a passing handshake it prints out the server-negotiated cipher and the Server Temp Key, whereas a failing one does not contain this information. Both of these details can be seen in the tshark data, even if the connection does not succeed.

Since in our case this ended up being a server issue, comparing differences between the old and the new OpenSSL client connecting to the same target didn’t teach us a lot except for the fact that the same cipher didn’t work anymore. It was not until I compared different targets that I noticed the pubkey length differed.