en
de

Wildcard SSL certificates in a Cloud Kubernetes deployment

27 February 2020
| |
Reading time: 15 minutes

TL;DR: in the brave new world of DevOps, you need to allow time both for learning in depth (the links in this article are a good start) and for trial and error if you want to achieve something even slightly non-standard.

Kubernetes logoHelm logo

Security can be darn awkward

If you have ever published a secure HTTP service to the worldwide web, you will be aware that the server’s X.509 SSL certificate has to be compatible with access through a number of different domain-names. Different clients may refer to your server variously as e.g. mydomain.com, www.mydomain.com, myservice.mydomain.com or www.myservice.mydomain.com. Accesses from behind the Internet gateway may also refer to the server by some internal name, e.g. myservice.mydomain.co.uk. Moreover, you may wish to facilitate canary releases, or fast switching between different versions of the service just by updating DNS entries, which requires yet more aliases.

You have to list as “subject alternate names” (SAN) in the SSL certificate all possible domain-names that you might enter in the DNS and use to access the service. Thankfully there is a shortcut: use a wildcard such as *.mydomain.com.

I have recently experienced almost a week of frustration to enable a group of Kubernetes-hosted micro-services to return a server certificate containing a wildcard DNS name. We eventually found one solution that worked (even though for various reasons we have not yet put it into production).

Many different moving parts

For reasons that I won’t go into, our team decided to create a Kubernetes cluster in AWS instead of using the AWS container service ECS. I have no reason to believe that the approach described below wouldn’t work in ECS. However, we have encountered so many surprises along the way that I almost expect some kind of gotcha.

We used Terraform to set up our infrastructure (AWS virtual private clouds – VPCs, databases, Kubernetes clusters and so on). We used Skaffold to manage builds and Helm 2 to deploy Docker containers. The applications, whose source is kept in github, are written mainly in node.js and AngularJS. We use liquibase to initialise and maintain the database schema and contents. There are separate development/test/QA and preproduction/production clusters, as well as a third cluster to house our Jenkins and Chartmuseum installations.

That’s a lot of knowledge for anyone new to the project to absorb. On top of this, developers must get to grips with the application architecture itself, which has evolved over a number of years. Part of this evolution was the transition to a container architecture. This involved automating the provision of SSL/TLS server certificates using a standard cert-manager container. For cost reasons, we chose to obtain our certificates from LetsEncrypt.

The shape of the solution

The following simplified diagram illustrates how the application works. IoT clients use Service 1 (Svc1), while mobile clients and Web clients (i.e. HTML / AngularJS pages) use Service 2 (Svc2). The UI landing page loads the Web application into the browser from the UI service. All three client types log on using an Authentication service that returns a standard Oauth2 token. Two background micro-services process events received on Redis queues (illustrated as dashed-line arrows) including some events generated by a scheduler clock.

Application Architecture Schematic

Application Architecture Schematic

I readily acknowledge that multiple processes all accessing the same database is not ideal. Nor is using a custom scheduler clock in place of the Kubernetes CronJob controller. Did I mention that this architecture is constantly evolving?

Each of the orange squares above represents a separate Docker container; even the Redis message broker. These containers run in Kubernetes “pods”. In minikube and in the dev/test/QA cluster, the database is also provisioned as a pod with an associated persistent volume claim. This allows us to run as many different copies of the application as we need to avoid different developers and testers tripping over each other. (In pre-production and production, Amazon’s RDS PostgreSQL service is used instead for resilience and performance).

The parallel-running copies of the application are separated by namespaces. Each cluster has multiple namespaces, distinguished by an application-specific suffix. For example, for the primary deployment of the latest development version, the suffix would be “-dev”, leading to namespaces myservice-dev and nginx-controller-dev. The latter accommodates only the ingress service (an nginx reverse proxy server) and a default back-end, which we don’t use.

One or more helm charts define the contents of each namespace. The following illustration indicates the relationship between namespaces within a complete cluster. In a few of the namespaces, I have listed the names of the pods (omitting duplicates).

namespace schematic

Because all HTTPs accesses to any one deployment are made via one unique ingress, we terminate the SSL/TLS connection there and forward requests onward as plain HTTP to the first-tier micro-services. Each of these micro-services has a distinct internal IP address, associated with a service name, and port number within the cluster, which the routing section of the ingress configuration can use to route requests for different URI paths to different back-ends.

Managing Server Certificates

The cert-manager pods live in their own namespace. There is only one of these per Kubernetes cluster, so it doesn’t need the suffix mentioned above. Every Ingress knows which certificate(s) it needs and requests them from the cert-manager, which either returns the requested certificate from its cache or generates a new certificate request via the appropriate issuer (or cluster issuer).

Issuers, and ClusterIssuers, are Kubernetes resources that represent certificate authorities (CAs), which are able to generate signed certificates by honouring certificate-signing requests (CSRs). All cert-manager certificates require a referenced issuer that is in a ready condition to attempt to honour the request. ClusterIssuers are more versatile, because:

  • They can issue certificates into any desired namespace, whereas Issuers are confined to a single namespace
  • Crucially in this context, unlike the Issuer, the ClusterIssuer is configured to use “ambient credentials” by default, which means that it can be configured to assume the required IAM roles in AWS to allow cert-manager to solve DNS01 challenges (see below).

To successfully request a certificate from Let’s Encrypt, cert-manager must solve ACME Challenges in order to prove that the client owns the DNS addresses that are being requested. It encapsulates one or more challenges (one per domain-name) in an Order, which you can think of as an instance of a CSR. Cert-manager currently supports two types of challenge – DNS01 and HTTP01. To validate a certificate signing request containing wildcards in either the common name or any of the SANs of the requested certificate, it appears that Let’s Encrypt relies on the DNS01 challenge. To solve it, cert-manager has to add and remove special DNS entries. For certificate requests without wildcards, the HTTP01 challenge is sufficient, and a non-cluster Issuer has sufficient privileges to enable cert-manager to complete it.

To the code!

Enough introduction; let’s have a look at some files.

Terraform

I’ll skim over this, because likely as not, your solution looks quite different to ours. Best-practice guides (in ascending order of size / complexity) are published by Bill Wang, Anton Babenko and HashiCorp itself.

Helm

Within the devops module of our solution code repository, we have a folder per templated namespace. From the diagram above, the folders would be called myservice and nginx-controller. Whenever we install or upgrade the helm chart, we specify the release and the target namespace, both including the same suffix (see the helm upgrade commands below).

nginx-controller

Because we use a helm chart from a public chart repository, all we need to store here are some configuration values.

devops/nginx-controller/values.yml:


controller:
  ingressClass: nginx
  rbac:
    create: true
  config:
    hide-headers: X-Powered-By

  proxySetHeaders:
    # X-Forwarded-For, -Host, -Port and -Proto are automatically added
    X-Forwarded-Server: $host
    
  addHeaders:
    # Strict-Transport-Security header is automatically added unless controller.config.hsts is set to "false"
    X-Frame-Options: DENY
    X-Content-Type-Options: nosniff
    X-XSS-Protection: "1; mode=block"
    Content-Security-Policy: "default-src 'none'; script-src 'self' www.google-analytics.com zuhlkeems.atlassian.net 'unsafe-eval' 'unsafe-inline'; font-src 'self'; connect-src 'self'; img-src 'self' data: zuhlkeems.atlassian.net www.google-analytics.com; style-src 'self' 'unsafe-inline'; child-src zuhlkeems.atlassian.net 'self';"

Separate files define some additional context-dependent configuration values. Note in particular that extraArgs defines command-line arguments that can be passed to Nginx at start-up. Thus in the CD context and in the production context, we’re telling Nginx to get the server certificate from the named secret in the kube-system namespace (unless a more specific one is configured for a given ingress). Later, we’ll ensure that this secret contains a wildcard certificate properly signed by Let’s Encrypt.

devops/contexts/minikube/nginx-controller-values.yaml:


controller:
  service:
    type: "NodePort"
  config:
    hsts: "false"
  extraArgs:
    default-ssl-certificate: "kube-system/ingress-cert-tls-selfsigned"

devops/contexts/cd.mydomain.com/nginx-controller-values.yaml:


controller:
  service:
    type: "LoadBalancer"
  config:
    hsts: "true"
  extraArgs:
    default-ssl-certificate: "kube-system/ingress-cert-tls-acme"

To deploy, we create the target namespace, set the shell variables as follows and then use the snippet of bash script below:

  • context: the designation of the target cluster; e.g. minikube, cd.mydomain.com or prd.mydomain.com
  • nginx_controller_release: one of nginx-controller-dev , nginx-controller-test, nginx-controller-qa or temporary release for end-to-end testing
  • nginx_controller_namespace: the same as nginx_controller_release

helm upgrade \
${nginx_controller_release} \
stable/nginx-ingress \
--version 1.26.2 \
--install \
--namespace ${nginx_controller_namespace} \
-f devops/nginx-controller/values.yaml \
-f devops/contexts/${context}/nginx-controller-values.yaml

A further snippet of bash script requests AWS’s Route53 service to create a DNS entry – an A record that defines an Alias to the automatically provisioned Elastic Load Balancer, once it has started up. We chose to use the suffixed service name as the domain name – e.g. one of myservice-dev.mydomain.com , myservice-test.mydomain.com, myservice-qa.mydomain.com or a temporary domain name for end-to-end testing.

myservice

To confer the required level of privilege on the cluster issuer in the cluster cd.mydomain.com (where aaaaaaaaaaaa stands for your account ID, a 12-digit number string):

  • Create an AWS IAM role arn:aws:iam::aaaaaaaaaaaa:role/cert-manager based on the most common use case, EC2
  • Attach a custom policy arn:aws:iam::aaaaaaaaaaaa:policy/DNS01ChallengeSolver (see below)
  • Attach the trust relationship shown below, remembering to substitute your account ID for aaaaaaaaaaaa and your cluster’s domain name for cd.mydomain.com

DNS01ChallengeSolver policy:


{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "route53:GetChange",
            "Resource": "arn:aws:route53:::change/*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "route53:ChangeResourceRecordSets",
                "route53:ListResourceRecordSets"
            ],
            "Resource": "arn:aws:route53:::hostedzone/*"
        }
    ]
}

cert-manager trust relationship:


{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "ec2.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    },
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::aaaaaaaaaaaa:role/nodes.cd.mydomain.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

You may have to add further Statements to allow the cluster issuers of any other cluster in your hosted zone (e.g. prd.mydomain.com) to also assume the same role.

The following Helm chart controls the deployment of the multi-service solution. Adapt it to reflect your service and domain name:

devops/myservice/Chart.yaml:


apiVersion: v1
name: myservice
description: Helm chart for the myservice application

version: 0.x.y
appVersion: 0.x.y
keywords:
  - myservice
sources:
  - https://github.com/mydomain/myservice

The values files below configure the deployment. By virtue of the way in which we invoke the above chart (see command line below), default values set in the later values files override values in the first one. Command-line definitions in turn override values from the files. For example, we use overrides to substitute locally built docker images for those in the project docker repository built and tagged by our CI / CD service. Most of the configuration values below relate to the solution architecture shown above, but pay particular attention to the ingress section. Remember to substitute your docker image repository for aaaaaaaaaaaa.dkr.ecr.eu-west-2.amazonaws.com:

devops/myservice/values.yaml:


nodeEnv: dev
localdev: false

postgres:
  ****: ****
  ****: ****

redis:
  host: redis

myserviceDb:
  enabled: false
  host: myservice-db
  image: aaaaaaaaaaaa.dkr.ecr.eu-west-2.amazonaws.com/myservice-db:0.x.y
  liquibaseImage: aaaaaaaaaaaa.dkr.ecr.eu-west-2.amazonaws.com/myservice-liquibase:0.x.y

ui:
  host: myservice-ui
  port: 80
  image: aaaaaaaaaaaa.dkr.ecr.eu-west-2.amazonaws.com/myservice-ui:0.x.y

auth:
  host: myservice-auth
  port: 3004
  image: aaaaaaaaaaaa.dkr.ecr.eu-west-2.amazonaws.com/myservice-auth:0.x.y

svc1:
  host: myservice-svc1
  port: 3000
  image: aaaaaaaaaaaa.dkr.ecr.eu-west-2.amazonaws.com/myservice-svc1:0.x.y

svc2:
  host: myservice-svc2
  port: 3002
  image: aaaaaaaaaaaa.dkr.ecr.eu-west-2.amazonaws.com/myservice-svc2:0.x.y

clck:
  image: aaaaaaaaaaaa.dkr.ecr.eu-west-2.amazonaws.com/myservice-clck:0.x.y

svc3:
  image: aaaaaaaaaaaa.dkr.ecr.eu-west-2.amazonaws.com/myservice-svc3:0.x.y

svc4:
  image: aaaaaaaaaaaa.dkr.ecr.eu-west-2.amazonaws.com/myservice-svc4:0.x.y

ingress:
  hostname: k8s.mydomain.com
  # Set this to the same value as controller.ingressClass used in the
  # nginx controller chart.
  # This can be used to support multiple ingress controllers for multi-tenancy.
  ingressClass: nginx
  acmeAccountEmail: acme.manager@mydomain.com
  acmeStaging: false

devops/contexts/minikube/dev-myservice-values.yaml:


localdev: true
ingress:
  acmeStaging: true
myserviceDb:
  image: myservice-db:latest
  liquibaseImage: myservice-liquibase-with-e2e-test-data:latest
ui:
  image: myservice-ui:latest
auth:
  image: myservice-auth:latest 
svc1:
  image: myservice-svc1:latest
svc2:
  image: myservice-svc2:latest
clck:
  image: myservice-clck:latest
svc3:
  image: myservice-svc3:latest
svc4:
  image: myservice-svc4:latest

devops/contexts/cd.mydomain.com/dev-myservice-values.yaml:


ingress:
  acmeStaging: true
myserviceDb:
  enabled: true
  liquibaseImage: aaaaaaaaaaaa.dkr.ecr.eu-west-2.amazonaws.com/myservice-liquibase-with-test-data:0.x.y

We launch the chart using the bash script snippet below (or the equivalent Skaffold deploy.helm configuration). First we create the target namespace and set the shell variables as follows:

  • context: the designation of the target cluster; e.g. minikube, cd.mydomain.com or prd.mydomain.com
  • contextEnv: one of dev-, test-, qa- or blank by default for end-to-end testing (the values files above illustrate the dev- case)
  • myservice_release: one of myservice-dev , myservice-test, myservice-qa or temporary release for end-to-end testing
  • myservice_namespace: the same as myservice_release
  • hostname: the externally visible domain-name of the service; the same as the release name (see below) fully qualified with your domain-name (mydomain.com in this example)

helm upgrade \
${myservice_release} \
devops/myservice \
--install \
--namespace ${myservice_namespace} \
--set ingress.hostname=${hostname} \
-f devops/contexts/${context}/${contextEnv}myservice-values.yaml

Under devops/myservice (the third argument) Helm expects to find the file Chart.yaml and the default values.yaml (see above), together with an optional tests folder and a templates folder. This folder contains template yaml files for each resource to be deployed by the chart. Additional files you may find here include _helpers.tpl, which defines snippets of yaml with parameter placeholders that can be substituted into other yaml files.

We’ll examine just a few of these resources here. The first returns the ingress routing rules and the name of the cert-manager secret, either ingress-cert-tls-selfsigned or (if using the Let’s Encrypt service) ingress-cert-tls-acme. It is important to note that the routing rules do not specify a hostname. They will apply equally, regardless of the hostname used to access the ingress. If we didn’t do this, we would have to add these rules for every possible hostname a client might use.

An excerpt from devops/myservice/templates/_helpers.tpl:


{{- define "myservice.certSecretName" }}ingress-cert-tls-{{ if .Values.localdev }}selfsigned{{ else }}acme{{ end }}{{- end }}

{{- define "myservice.ingressPaths" }}
  - http:
      paths:
      - path: /svc2
        backend:
          serviceName: {{ $.Values.svc2.host }}
          servicePort: {{ $.Values.svc2.port }}
      - path: /explorer
        backend:
          serviceName: {{ $.Values.svc2.host }}
          servicePort: {{ $.Values.svc2.port }}
      - path: /svc1
        backend:
          serviceName: {{ $.Values.svc1.host }}
          servicePort: {{ $.Values.svc1.port }}
      - path: /auth
        backend:
          serviceName: {{ $.Values.auth.host }}
          servicePort: {{ $.Values.auth.port }}
      - path: /
        backend:
          serviceName: {{ $.Values.ui.host }}
          servicePort: {{ $.Values.ui.port }}
{{- end }}

devops/myservice/templates/ingress.yaml:


apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
  name: application
  annotations:
    cert-manager.io/cluster-issuer: certificate-cluster-issuer
    kubernetes.io/ingress.class: {{ .Values.ingress.ingressClass }}
    {{- if .Values.localdev }}
    nginx.ingress.kubernetes.io/ssl-redirect: "false"
    {{- end }}
spec:
  rules:
{{- template "myservice.ingressPaths" (dict "Values" .Values "ruleHostname" .Values.ingress.hostname ) }}

devops/myservice/templates/ingress-certificate.yaml:


apiVersion: cert-manager.io/v1alpha2
kind: Certificate
metadata:
  name: {{ template "myservice.certSecretName" . }}
  namespace: kube-system
spec:
  commonName: "*.mydomain.com"
  dnsNames:
  - "*.mydomain.com"
  - "mydomain.com"
  acme:
    config:
    - dns01:
        provider: dns
      domains:
      - "*.mydomain.com"
      - "mydomain.com"
  secretName: {{ template "myservice.certSecretName" . }}
  issuerRef:
    name: certificate-cluster-issuer
    kind: ClusterIssuer

devops/myservice/templates/certificate-cluster-issuer.yaml:


apiVersion: cert-manager.io/v1alpha2
kind: ClusterIssuer
metadata:
  name: certificate-cluster-issuer
spec:
{{- if .Values.localdev }}
  selfSigned: {}
{{- else }}
  acme:
{{- if .Values.ingress.acmeStaging }}
    server: https://acme-staging-v02.api.letsencrypt.org/directory
    privateKeySecretRef:
      name: letsencrypt-staging
{{- else }}
    server: https://acme-v02.api.letsencrypt.org/directory
    privateKeySecretRef:
      name: letsencrypt
{{- end}}
    email: {{ .Values.ingress.acmeAccountEmail }}
    solvers:
    - dns01:
        cnameStrategy: Follow
        route53:
          region: eu-west-2
          hostedZoneID: zzzzzzzzzzzz # optional
          role: arn:aws:iam::aaaaaaaaaaaa:role/cert-manager
    - http01:
        ingress:
          class:  {{ .Values.ingress.ingressClass }}
{{- end }}

In the above yaml files, substitute your Let’s Encrypt subscriber email address, domain-name, service name, hosted-zone ID and account ID as appropriate.

Concluding Remarks

Unexpected Limits

Let’s Encrypt limits you to a total of 50 distinct certificate issue events per week, and a maximum of 5 duplicate certificates. While you are experimenting, therefore, it would be wise to stick to “staging certificates”, where these limits do not apply. Once you’re confident that your deployment environments are stable, e.g. that you are no longer re-deploying the cert-manager several times a day, you could remove the directive ingress.acmeStaging: true from the appropriate context value file(s) and get fully valid certificates from Let’s Encrypt.

Debugging tools

While debugging, we found that a very useful tool was kubectl get/describe.logs. Here’s the start of a typical forensic session. Pay particular attention to anything logged under the eventsheading of any describe output.

$ kubectl get secret -A | grep tls
cert-manager                        cert-manager-webhook-ca                                           kubernetes.io/tls                     3      7d3h
cert-manager                        cert-manager-webhook-tls                                          kubernetes.io/tls                     3      7d3h
kube-system                         ingress-cert-tls-acme                                             kubernetes.io/tls                     3      7d1h

$ kubectl describe -n kube-system secret ingress-cert-tls-acme
Name:         ingress-cert-tls-acme
Namespace:    kube-system
Labels:       
Annotations:  cert-manager.io/alt-names: *.mydomain.com,mydomain.com
              cert-manager.io/certificate-name: ingress-cert-tls-acme
              cert-manager.io/common-name: *.mydomain.com
              cert-manager.io/ip-sans: 
              cert-manager.io/issuer-kind: ClusterIssuer
              cert-manager.io/issuer-name: certificate-cluster-issuer
              cert-manager.io/uri-sans: 

Type:  kubernetes.io/tls

Data
====
tls.key:  1675 bytes
ca.crt:   0 bytes
tls.crt:  3611 bytes

$ kubectl get cert -A
NAMESPACE                NAME                  READY   SECRET                  AGE
kube-system              ingress-cert-tls-acme False   ingress-cert-tls-acme   3d18h

$ kubectl -n kube-system describe cert ingress-cert-tls-acme
Name:         ingress-cert-tls-acme
Namespace:    kube-system
Labels:       
Annotations:  
API Version:  cert-manager.io/v1alpha2
Kind:         Certificate
Metadata:
  Creation Timestamp:  2020-02-20T17:32:12Z
  Generation:          1
  Resource Version:    11242880
  Self Link:           /apis/cert-manager.io/v1alpha2/namespaces/kube-system/certificates/ingress-cert-tls-acme
  UID:                 9180b77a-0a91-4ced-829b-910aa761ca55
Spec:
  Common Name:  *.mydomain.com
  Dns Names:
    *.mydomain.com
    mydomain.com
  Issuer Ref:
    Name:       certificate-cluster-issuer
  Secret Name:  ingress-cert-tls-acme
Status:
  Conditions:
    Last Transition Time:  2020-02-20T17:32:12Z
    Message:               Certificate is up to date and has not expired
    Reason:                Ready
    Status:                True
    Type:                  Ready
  Not After:               2020-05-19T10:05:01Z
Events:                    

$ kubectl get po -n cert-manager
NAME                                       READY   STATUS    RESTARTS   AGE
cert-manager-6f9696dd7-vlr8q               1/1     Running   0          3d22h
cert-manager-cainjector-7d47d59998-gljdh   1/1     Running   0          3d22h
cert-manager-webhook-6559cc8549-z7cbz      1/1     Running   0          3d22h

$ kubectl -n cert-manager logs cert-manager-6f9696dd7-vlr8q
I0225 10:18:39.911656       1 controller.go:135] cert-manager/controller/challenges "msg"="finished processing work item" "key"="########" 
I0225 10:18:39.918142       1 controller.go:129] cert-manager/controller/challenges "msg"="syncing item" "key"="########" 
...

The following resource types are particularly useful to inspect with kubectl get and kubectl describe:

  • namespace (ns)
  • pod (po)
  • secret
  • certificate (cert)
  • certificaterequest (cr)
  • order
  • challenge

Another really useful tool is the Let’s Debug Toolkit, which lets you view all non-staging certificates allocated in your domain within the last week, month or quarter. This first revealed the weekly limits imposed by Let’s Encrypt.

 

Let's Debug Toolkit screenshot

Suggested further reading

Comments (0)

×

Sign up for our Updates

Sign up now for our updates.

This field is required
This field is required
This field is required

I'm interested in:

Select at least one category
You were signed up successfully.

Receive regular updates from our blog

Subscribe

Or would you like to discuss a potential project with us? Contact us »