1 - Auto Healing

Automatic Healing for Unresponsive or Failed Kubernetes Nodes

In our Kubernetes Services, we have implemented a robust auto-healing mechanism to ensure the high availability and reliability of our infrastructure. This system is designed to automatically manage and replace unhealthy nodes, thereby minimizing downtime and maintaining the stability of our services.

Auto-Healing Mechanism

Triggers

  1. Unready Node Detection:

    • The auto-healing process is triggered when a node remains in an “not ready” or “unknown” state for 15 minutes.
    • This delay allows for transient issues to resolve themselves without unnecessary node replacements.
  2. Node Creation Failure:

    • To ensure new nodes are given adequate time to initialize and join the cluster, we have configured startup timers:
      • Control Plane Nodes:
        • A new control plane node has a maximum startup time of 30 minutes. This extended period accounts for the critical nature and complexity of control plane operations.
      • Worker Nodes:
        • A new worker node has a maximum startup time of 10 minutes, reflecting the relatively simpler setup process compared to control plane nodes.

Actions

  1. Unresponsive Node:
    • Once a node is identified as unready for the specified duration, the auto-healing system deletes the old node.
    • Simultaneously, it initiates the creation of a new node to take its place, ensuring the cluster remains properly sized and functional.

Built-in Failsafe

To prevent cascading failures and to handle scenarios where multiple nodes become unresponsive, we have a built-in failsafe mechanism:

  • Threshold for Unresponsive Nodes:
    • If more than 35% of the nodes in the cluster become unresponsive simultaneously, the failsafe activates.
    • This failsafe blocks any further changes, as such a widespread issue likely indicates a broader underlying problem, such as network or platform-related issues, rather than isolated node failures.

By integrating these features, our Kubernetes Services can automatically handle node failures and maintain high availability, while also providing safeguards against systemic issues. This auto-healing capability ensures that our infrastructure remains resilient, responsive, and capable of supporting continuous service delivery.

2 - Auto Scaling

Automatically scale your kubernetes nodes

We now offer autoscaling of nodes.

What is a nodegroup?

In order to simplify node management we now have nodegroup.

A nodegroup is a set of nodes, They span over all 3 of our availability zones. All nodes in a nodegroup are using the same flavour. This means if you want to mix flavours in your cluster there will be at least one nodegroup per flavor. We can also create custom nodegroups upon requests meaning you can have 2 nodegroups with the same flavour.

By default clusters are created with one nodegroup called “worker”. When listing nodes by running kubectl get nodes you can see the node group by looking at the node name. All node names begin with clustername - nodegroup.

In the example below we have the cluster hux-lab1 and can see the default workers are located in the nodegroup worker and additionally, the added nodegroup nodegroup2 with a few extra nodes.

❯ kubectl get nodes
NAME                           STATUS   ROLES           AGE     VERSION
hux-lab1-control-plane-c9bmm   Ready    control-plane   2d18h   v1.27.3
hux-lab1-control-plane-j5p42   Ready    control-plane   2d18h   v1.27.3
hux-lab1-control-plane-wlwr8   Ready    control-plane   2d18h   v1.27.3
hux-lab1-worker-447sn          Ready    <none>          2d18h   v1.27.3
hux-lab1-worker-9ltbp          Ready    <none>          2d18h   v1.27.3
hux-lab1-worker-htfbp          Ready    <none>          15h     v1.27.3
hux-lab1-worker-k56hn          Ready    <none>          16h     v1.27.3
hux-lab1-nodegroup2-33hbp      Ready    <none>          15h     v1.27.3
hux-lab1-nodegroup2-54j5k      Ready    <none>          16h     v1.27.3

How to activate autoscaling?

Autoscaling currently needs to be configured by Elastx support.

In order to activate auto scaling we need to know clustername and nodegroup with two values for minimum/maximum number of desired nodes. Currently we have a minimum set to 3 nodes however this is subject to change in the future.

Nodes are split into availability zones meaning if you want 3 nodes you get one in each availability zone.

Another example is to have a minimum of 3 nodes and maximum of 7. This would translate to minimum one node per availability zone and maximum 3 in STO1 and 2 in STO2 and STO3 respectively. To keep it simple we recommend using increments of 3.

If you are unsure contact out support and we will help you get the configuration you wish for.

How does autoscaling know when to add additional nodes?

Nodes are added when they are needed. There are two scenarios:

  1. You have a pod that fails to be scheduled on existing nodes
  2. Scheduled pods requests more then 100% of any resource, this method is smart and senses the amount of resources per node and can therfor add more than one node at a time if required.

When does the autoscaler scale down nodes?

The autoscaler removes nodes when it senses there is enough free resources to accomodate all current workload (based on requests) on fewer nodes. To avoid all nodes having 100% resource requests (and thereby usage), there is also a built-in mechanism to ensure there is always at least 50% of a node available resources to accept additional requests.

Meaning if you have a nodegroup with 3 nodes and all of them have 4 CPU cores you need to have a total of 2 CPU cores that is not requested per any workload.

To refrain from triggering the auto-scaling feature excessively, there is a built in delay of 10 minutes for scale down actions to occur. Scale up events are triggered immediately.

Can I disable auto scaling after activating it?

Yes, just contact Elastx support and we will help you with this.

When disabling auto scaling node count will be locked. Contact support if the number if nodes you wish to keep deviates from current amount od nodes, and we will scale it for you.

3 - Cert-manager and Cloudflare demo

Using Cluster Issuer with cert-manager and wildcard DNS

In this guide we will use a Cloudflare managed domain and a our own cert-manager to provide LetsEncrypt certificates for a test deployment.

The guide is suitable if you have a domain connected to a single cluster, and would like a to issue/manage certificates from within kubernetes. The setup below becomes Clusterwider, meaning it will deploy certificates to any namespace specifed.

Prerequisites

Setup ClusterIssuer

Create a file to hold the secret of your api token for your Cloudflare DNS. Then create the ClusterIssuer configuration file adapted for Cloudflare.

apiVersion: v1
kind: Secret
metadata:
  name: cloudflare-api-token
  namespace: cert-manager
type: Opaque
stringData:
  api-token: "<your api token>"
---
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: cloudflare-issuer
spec:
  acme:
    server: https://acme-v02.api.letsencrypt.org/directory
    email: <your email>
    privateKeySecretRef:
      name: cloudflare-issuer-key
    solvers:
    - dns01:
        cloudflare:
          email: <your email>
          apiTokenSecretRef:
            name: cloudflare-api-token
            key: api-token
kubectl apply -f cloudflare-issuer.yml

The clusterIssuer is soon ready. Example output:

kubectl get clusterissuers.cert-manager.io 
NAME                READY   AGE
cloudflare-issuer   True    6d18h

Expose a workload and secure with Let’s encrypt certificate

In this section we will setup a deployment, with it’s accompanying service and ingress object. The ingress object will request a certificate for test2.domain.ltd, and once fully up and running, should provide https://test2.domain.ltd with a valid letsencrypt certificate.

We’ll use the created ClusterIssuer and let cert-manager request new certificates for any added ingress object. This setup requires the “*” record setup in the DNS provider.

This is how the DNS is setup in this particular example: A A record (“domain.ltd”) points to the loadbalancer IP of the cluster. A CNAME record refers to ("*") and points to the A record above.

This example also specifies the namespace “echo2”.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: echo2-dep
  namespace: echo2
spec:
  selector:
    matchLabels:
      app: echo2
  replicas: 1
  template:
    metadata:
      labels:
        app: echo2
    spec:
      containers:
      - name: echo2
        image: hashicorp/http-echo
        args:
        - "-text=echo2"
        ports:
        - containerPort: 5678
      securityContext:
        runAsUser: 1001
        fsGroup: 1001
---
apiVersion: v1
kind: Service
metadata:
  labels:
    app: echo2
  name: echo2-service
  namespace: echo2
spec:
  ports:
    - protocol: TCP
      port: 5678
      targetPort: 5678
  selector:
    app: echo2
  type: ClusterIP
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: echo2-ingress
  namespace: echo2
  annotations:
    cert-manager.io/cluster-issuer: cloudflare-issuer
    kubernetes.io/ingress.class: "nginx"
spec:
  ingressClassName: nginx
  tls:
  - hosts:
    - test2.domain.ltd
    secretName: test2-domain-tls
  rules:
  - host: test2.domain.ltd
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: echo5-service
            port:
              number: 5678

The DNS challenge and certificate issue process takes a couple of minutes. You can follow the progress by watching:

kubectl events -n cert-manager

Once completed, it shall all be accessible at http://test2.domain.ltd

4 - Change PV StorageClass

How to migrate between storage classes

This guide details all steps to change storage class of a volume. The instruction can be used to migrate from one storage class to another, while retaining data. For example from 8kto v2-4k.

Prerequisites

  • Access to the kubernetes cluster
  • Access to Openstack kubernetes Project

Preparation steps

  1. Populate variables

    Complete with relevant names for your setup. Then copy/paste them into the terminal to set them as environment variables that will be used throughout the guide. PVC is the

    PVC=test1
    NAMESPACE=default
    NEWSTORAGECLASS=v2-1k
    
  2. Fetch and populate the PV name by running:

    PV=$(kubectl get pvc -n $NAMESPACE $PVC -o go-template='{{.spec.volumeName}}')
    
  3. Create backup of PVC and PV configurations

    Fetch the PVC and PV configurations and store in /tmp/ for later use:

    kubectl get pvc -n $NAMESPACE $PVC -o yaml | tee /tmp/pvc.yaml
    kubectl get pv  $PV -o yaml | tee /tmp/pv.yaml
    
  4. Change VolumeReclaimPolicy

    To avoid deletion of the PV when deleting the PVC, the volume needs to have VolumeReclaimPolicy set to Retain.

    Patch:

    kubectl patch pv $PV -p '{"spec":{"persistentVolumeReclaimPolicy":"Retain"}}'
    
  5. Stop pods from accessing the mounted volume (ie kill pods/scale statefulset/etc..).

  6. Delete the PVC.

    kubectl delete pvc -n "$NAMESPACE" "$PVC"
    

Login to Openstack

  1. Navigate to: Volumes -> Volumes

  2. Make a backup of the volume From the drop-down to the right, select backup. The backup is good practice, not used in the following steps.

  3. Change the storage type to desired type. The volume should now or shortly have status Available. Dropdown to the right, Edit volume -> Change volume type:

    • Select your desired storage type
    • Select Migration policy=Ondemand

    The window will close, and the volume will be updated and migrated (to the v2 storage platform) if necessary, by the backend. The status becomes “Volume retyping”. Wait until completed.

    We have a complementary guide here.

Back to kubernetes

  1. Release the tie between PVC and PV

    The PV is still referencing its old PVC, in the claimRef, found under spec.claimRef.uid. This UID needs to be nullified to release the PV, allowing it to be adopted by a PVC with correct storageClass.

    Patch claimRef to null:

    kubectl patch pv "$PV" -p '{"spec":{"claimRef":{"namespace":"'$NAMESPACE'","name":"'$PVC'","uid":null}}}'
    
  2. The PV StorageClass in kubernetes does not match to its counterpart in Openstack.

    We need to patch the storageClassName reference in the PV:

    kubectl patch pv "$PV" -p '{"spec":{"storageClassName":"'$NEWSTORAGECLASS'"}}'
    
  3. Prepare a new PVC with the updated storageClass

    We need to modify the saved /tmp/pvc.yaml.

    1. Remove “last-applied-configuration”:

      sed -i '/kubectl.kubernetes.io\/last-applied-configuration: |/ { N; d; }' /tmp/pvc.yaml
      
    2. Update existing storageClassName to the new one:

      sed -i 's/storageClassName: .*/storageClassName: '$NEWSTORAGECLASS'/g' /tmp/pvc.yaml
      
  4. Apply the updated /tmp/pvc.yaml

    kubectl apply -f /tmp/pvc.yaml
    
  5. Update the PV to bind with the new PVC

    We must allow the new PVC to bind correctly to the old PV. We need to first fetch the new PVC UID, then patch the PV with the PVC UID so kubernetes understands what PVC the PV belongs to.

    1. Retrieve the new PVC UID:

      PVCUID=$(kubectl get -n "$NAMESPACE" pvc "$PVC" -o custom-columns=UID:.metadata.uid --no-headers)
      
    2. Patch the PV with the new UID of the PVC:

      kubectl patch pv "$PV" -p '{"spec":{"claimRef":{"uid":"'$PVCUID'"}}}'
      
  6. Reset the Reclaim Policy of the volume to Delete:

    kubectl patch pv $PV -p '{"spec":{"persistentVolumeReclaimPolicy":"Delete"}}'
    
  7. Completed.

    • Verify the volume works healthily.
    • Update your manifests to reflect the new storageClassName.

5 - Ingress and cert-manager

Using Ingress resources to expose services

Follow along demo

In this piece, we show all steps to expose a web service using an Ingress resource. Additionally, we demonstrate how to enable TLS, by using cert-manager to request a Let’s Encrypt certificate.

Prerequisites

  1. A DNS record pointing at the public IP address of your worker nodes. In the examples all references to the domain example.ltd must be replaced by the domain you wish to issue certificates for. Configuring DNS is out of scope for this documentation.
  2. For clusters created on or after Kubernetes 1.26 you need to ensure there is a Ingress controller and cert-manager installed.

Create resources

Create a file called ingress.yaml with the following content:

---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: my-web-service
  name: my-web-service
spec:
  replicas: 3
  selector:
    matchLabels:
      app: my-web-service
  template:
    metadata:
      labels:
        app: my-web-service
    spec:
      securityContext:
        runAsUser: 1001
        fsGroup: 1001
      containers:
      - image: k8s.gcr.io/serve_hostname
        name: servehostname
        ports:
        - containerPort: 9376
---
apiVersion: v1
kind: Service
metadata:
  labels:
    app: my-web-service
  name: my-web-service
spec:
  ports:
  - port: 9376
    protocol: TCP
    targetPort: 9376
  selector:
    app: my-web-service
  type: ClusterIP
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: my-web-service-ingress
  annotations:
    cert-manager.io/issuer: letsencrypt-prod
spec:
  ingressClassName: nginx
  tls:
  - hosts:
    - example.tld
    secretName: example-tld
  rules:
  - host: example.tld
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: my-web-service
            port:
              number: 9376

Then create the resources in the cluster by running: kubectl apply -f ingress.yaml

Run kubectl get ingress and you should see output similar to this:

NAME                     CLASS   HOSTS         ADDRESS         PORTS     AGE
my-web-service-ingress   nginx   example.tld   91.197.41.241   80, 443   39s

If not, wait a while and try again. Once you see output similar to the above you should be able to reach your service at http://example.tld.

Exposing TCP services

If you wish to expose TCP services note that the tcp-services is located in the default namespace in our clusters.

Enabling TLS

A simple way to enable TLS for your service is by requesting a certificate using the Let’s Encrypt CA. This only requires a few simple steps.

Begin by creating a file called issuer.yaml with the following content:

apiVersion: cert-manager.io/v1
kind: Issuer
metadata:
  name: letsencrypt-prod
spec:
  acme:
    # Let's Encrypt ACME server for production certificates
    server: https://acme-v02.api.letsencrypt.org/directory
    # This email address will get notifications if failure to renew certificates happens
    email: valid-email@example.tld
    privateKeySecretRef:
      name: letsencrypt-prod
    solvers:
    - http01:
        ingress:
          class: nginx

Replace the email address with your own. Then create the Issuer in the cluster by running: kubectl apply -f issuer.yaml

Next edit the file called ingress.yaml from the previous example and make sure the Ingress resource matches the example below:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: my-web-service-ingress
  annotations:
    cert-manager.io/issuer: letsencrypt-prod
spec:
  ingressClassName: nginx
  tls:
  - hosts:
    - example.tld
    secretName: example-tld
  rules:
  - host: example.tld
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: my-web-service
            port:
              number: 9376

Make sure to replace all references to example.tld by your own domain. Then update the resources by running: kubectl apply -f ingress.yaml

Wait a couple of minutes and your service should be reachable at https://example.tld with a valid certificate.

Network policies

If you are using network policies you will need to add a networkpolicy that allows traffic from the ingress controller to the temporary pod that performs the HTTP challenge. With the default NGINX Ingress Controller provided by us this policy should do the trick.

kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
  name: letsencrypt-http-challenge
spec:
  policyTypes:
  - Ingress
  podSelector:
    matchLabels:
      acme.cert-manager.io/http01-solver: "true"
  ingress:
  - ports:
    - port: http
    from:
    - namespaceSelector:
        matchLabels:
          app.kubernetes.io/name: ingress-nginx

Advanced usage

For more advanced use cases please refer to the documentation provided by each project or contact our support:

6 - Install and upgrade cert-manager

A guide showing you how to install, upgrade and remove cert-manager

Starting at Kubernetes version v1.26, our default configured clusters are delivered without cert-manager.

This guide will assist you get a working up to date cert-manager and provide instructions for how to upgrade and delete it. Running your own is useful if you want to have full control.

The guide is based on cert-manager Helm chart, found here. We draw advantage of the option to install CRDs with kubectl, as recommended for a production setup.

Prerequisites

Helm needs to be provided with the correct repository:

  1. Setup helm repo

    helm repo add jetstack https://charts.jetstack.io --force-update
    
  2. Verify you do not have a namespace named elx-cert-manager as you first need to remove some resources.

    kubectl -n elx-cert-manager delete svc cert-manager cert-manager-webhook
    kubectl -n elx-cert-manager delete deployments.apps cert-manager cert-manager-cainjector cert-manager-webhook
    kubectl delete namespace elx-cert-manager
    

Install

  1. Prepare and install CRDs run:

    kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.14.4/cert-manager.crds.yaml
    
  2. Run Helm install:

    helm install \
      cert-manager jetstack/cert-manager \
      --namespace cert-manager \
      --create-namespace \
      --version v1.14.4 \
    

    A full list of available Helm values is on cert-manager’s ArtifactHub page.

  3. Verify the installation: Done with cmctl (cert-manager CLI https://cert-manager.io/docs/reference/cmctl/#installation).

    cmctl check api
    

    If everything is working you should get this message The cert-manager API is ready.

Upgrade

The setup used above is referenced in the topic “CRDs managed separately”.

In these examples <version> is “v1.14.4”.

  1. Update CRDS:

    kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/<version>/cert-manager.crds.yaml
    
  2. Update the Helm chart:

    helm upgrade cert-manager jetstack/cert-manager --namespace cert-manager --version v1.14.4 
    

Uninstall

To uninstall, use the guide here.

7 - Install and upgrade ingress-nginx

A guide showing you how to install, upgrade and remove ingress-nginx.

This guide will assist you get a working up to date ingress controller and provide instructions for how to upgrade and delete it. Running your own is useful if you want to have full control.

The guide is based on on ingress-nginx Helm chart, found here.

Prerequisites

Helm needs to be provided with the correct repository:

  1. Setup helm repo

    helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
    
  2. Make sure to update repo cache

    helm repo update
    

Generate values.yaml

We provide settings for two main scenarios of how clients connect to the cluster. The configuration file, values.yaml, must reflect the correct scenario.

  • Customer connects directly to the Ingress:

    controller:
      kind: DaemonSet
      metrics:
        enabled: true
      service:
        enabled: true
        annotations:
          loadbalancer.openstack.org/proxy-protocol: "true"
      ingressClassResource:
        default: true
      publishService:
        enabled: false  
      allowSnippetAnnotations: true
      config:
        use-proxy-protocol: "true"
    defaultBackend:
      enabled: true
    
  • Customer connects via Proxy:

    controller:
      kind: DaemonSet
      metrics:
        enabled: true
      service:
        enabled: true
        #loadBalancerSourceRanges:
        #  - <Proxy(s)-CIDR>
      ingressClassResource:
        default: true
      publishService:
        enabled: false  
      allowSnippetAnnotations: true
      config:
        use-forwarded-headers: "true"
    defaultBackend:
      enabled: true
    
  • Other useful settings:

    For a complete set of options see the upstream documentation here.

      [...]
      service:
        loadBalancerSourceRanges:        # Whitelist source IPs.
          - 133.124.../32
          - 122.123.../24
        annotations:
          loadbalancer.openstack.org/keep-floatingip: "true"  # retain floating IP in floating IP pool.
          loadbalancer.openstack.org/flavor-id: "v1-lb-2"     # specify flavor.
      [...]
    

Install ingress-nginx

Use the values.yaml generated in the previous step.

helm install ingress-nginx ingress-nginx/ingress-nginx --values values.yaml --namespace ingress-nginx --create-namespace

Example output:

NAME: ingress-nginx
LAST DEPLOYED: Tue Jul 18 11:26:17 2023
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
The ingress-nginx controller has been installed.
It may take a few minutes for the Load Balancer IP to become available.
You can watch the status by running 'kubectl --namespace default get services -o wide -w ingress-nginx-controller'
[..]

Upgrade ingress-nginx

Use the values.yaml generated in the previous step.

helm upgrade --install ingress-nginx ingress-nginx/ingress-nginx --values values.yaml --namespace ingress-nginx

Example output:

Release "ingress-nginx" has been upgraded. Happy Helming!
NAME: ingress-nginx
LAST DEPLOYED: Tue Jul 18 11:29:41 2023
NAMESPACE: default
STATUS: deployed
REVISION: 2
TEST SUITE: None
NOTES:
The ingress-nginx controller has been installed.
It may take a few minutes for the Load Balancer IP to be available.
You can watch the status by running 'kubectl --namespace default get services -o wide -w ingress-nginx-controller'
[..]

Remove ingress-nginx

The best practice is to use the helm template method to remove the ingress. This allows for proper removal of lingering resources, then remove the namespace. Use the values.yaml generated in the previous step.

Note: Avoid running multiple ingress controllers using the same IngressClass.
See more information here.

  1. Run the delete command

    helm template ingress-nginx ingress-nginx/ingress-nginx --values values.yaml --namespace ingress-nginx | kubectl delete -f -
    
  2. Remove the namespace if necessary

    kubectl delete namespace ingress-nginx
    

8 - Load balancers

Using a load balancer to expose services in the cluster

Load balancers in our Elastx Kubernetes CaaS service are provided by OpenStack Octavia in collaboration with the Kubernetes Cloud Provider OpenStack. This article will introduce some of the basics of how to use services of service type LoadBalancer to expose service using OpenStack Octavia load balancers. For more advanced use cases you are encouraged to read the official documentation of each project or contacting our support for assistance.

A quick example

Exposing services using a service with type LoadBalancer will give you an unique public IP backed by an OpenStack Octavia load balancer. This example will take you through the steps for creating such a service.

Create the resources

Create a file called lb.yaml with the following content:

---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app.kubernetes.io/name: echoserver
  name: echoserver
spec:
  replicas: 3
  selector:
    matchLabels:
      app.kubernetes.io/name: echoserver
  template:
    metadata:
      labels:
        app.kubernetes.io/name: echoserver
    spec:
      containers:
      - image: gcr.io/google-containers/echoserver:1.10
        name: echoserver
---
apiVersion: v1
kind: Service
metadata:
  labels:
    app.kubernetes.io/name: echoserver
  name: echoserver
  annotations:
    loadbalancer.openstack.org/x-forwarded-for: "true"
    loadbalancer.openstack.org/flavor-id: 552c16df-dcc1-473d-8683-65e37e094443
spec:
  ports:
  - port: 80
    protocol: TCP
    targetPort: 8080
    name: http
  selector:
    app.kubernetes.io/name: echoserver
  type: LoadBalancer

Then create the resources in the cluster by running: kubectl apply -f lb.yaml

You can watch the load balancer being created by running: kubectl get svc

This should output something like:

NAME         TYPE           CLUSTER-IP     EXTERNAL-IP   PORT(S)        AGE
echoserver   LoadBalancer   10.233.32.83   <pending>     80:30838/TCP   6s
kubernetes   ClusterIP      10.233.0.1     <none>        443/TCP        10h

The output in the EXTERNAL-IP column tells us that the load balancer has not yet been completely created.

We can investigate further by running: kubectl describe svc echoserver

Output should look something like this:

Name:                     echoserver
Namespace:                default
Labels:                   app.kubernetes.io/name=echoserver
Annotations:              loadbalancer.openstack.org/x-forwarded-for: true
Selector:                 app.kubernetes.io/name=echoserver
Type:                     LoadBalancer
IP Family Policy:         SingleStack
IP Families:              IPv4
IP:                       10.233.32.83
IPs:                      10.233.32.83
Port:                     <unset>  80/TCP
TargetPort:               8080/TCP
NodePort:                 <unset>  30838/TCP
Endpoints:
Session Affinity:         None
External Traffic Policy:  Cluster
Events:
  Type    Reason                Age   From                Message
  ----    ------                ----  ----                -------
  Normal  EnsuringLoadBalancer  115s  service-controller  Ensuring load balancer

Looking at the Events section near the bottom we can see that the Cloud Controller has picked up the order and is provisioning a load balancer.

Running the same command again (kubectl describe svc echoserver) after waiting some time should produce output like:

Name:                     echoserver
Namespace:                default
Labels:                   app.kubernetes.io/name=echoserver
Annotations:              loadbalancer.openstack.org/x-forwarded-for: true
Selector:                 app.kubernetes.io/name=echoserver
Type:                     LoadBalancer
IP Family Policy:         SingleStack
IP Families:              IPv4
IP:                       10.233.32.83
IPs:                      10.233.32.83
LoadBalancer Ingress:     91.197.41.223
Port:                     <unset>  80/TCP
TargetPort:               8080/TCP
NodePort:                 <unset>  30838/TCP
Endpoints:
Session Affinity:         None
External Traffic Policy:  Cluster
Events:
  Type    Reason                Age    From                Message
  ----    ------                ----   ----                -------
  Normal  EnsuringLoadBalancer  8m52s  service-controller  Ensuring load balancer
  Normal  EnsuredLoadBalancer   6m43s  service-controller  Ensured load balancer

Again looking at the Events section we can tell that the Cloud Provider has provisioned the load balancer for us (the EnsuredLoadBalancer event). Furthermore we can see the public IP address associated with the service by checking the LoadBalancer Ingress.

Finally to verify that the load balancer and service are operational run: curl http://<IP address from LoadBalancer Ingress>

Your output should look something like:

Hostname: echoserver-84655f4656-sc4k6

Pod Information:
        -no pod information available-

Server values:
        server_version=nginx: 1.13.3 - lua: 10008

Request Information:
        client_address=10.128.0.3
        method=GET
        real path=/
        query=
        request_version=1.1
        request_scheme=http
        request_uri=http://91.197.41.223:8080/

Request Headers:
        accept=*/*
        host=91.197.41.223
        user-agent=curl/7.68.0
        x-forwarded-for=213.179.7.4

Request Body:
        -no body in request-

Things to note:

  • You do not need to modify security groups when exposing services using load balancers.
  • The client_address is the address of the load balancer and not the client making the request, you can find the real client address in the x-forwarded-for header.
  • The x-forwarded-for header is provided by setting the loadbalancer.openstack.org/x-forwarded-for: "true" on the service. Read more about available annotations in the Advanced usage section.

Advanced usage

For more advanced use cases please refer to the documentation provided by each project or contact our support:

Good to know

Load balancers are billable resources

Adding services of type LoadBalancer will create load balancers in OpenStack, which is a billable resource and you will be charged for them.

Loadbalancer statuses

Load balancers within OpenStack have two distinct statuses, which may cause confusion regarding their meanings:

  • Provisioning Status: This status reflects the overall condition of the load balancer itself. If any issues arise with the load balancer, this status will indicate them. Should you encounter any problems with this status, please don’t hesitate to contact Elastx support for assistance.
  • Operating Status: This status indicates the health of the configured backends, typically referring to the nodes within your cluster, especially when health checks are enabled (which is the default setting). It’s important to note that an operational status doesn’t necessarily imply a problem, as it depends on your specific configuration. If a service is only exposed on a single node, for instance, this is to be expected since load balancers by default distribute traffic across all cluster nodes.

Provisioning status codes

Code Description
ACTIVE The entity was provisioned successfully
DELETED The entity has been successfully deleted
ERROR Provisioning failed
PENDING_CREATE The entity is being created
PENDING_UPDATE The entity is being updated
PENDING_DELETE The entity is being deleted

Operating status codes

Code Description
ONLINE - Entity is operating normally
- All pool members are healthy
DRAINING The member is not accepting new connections
OFFLINE Entity is administratively disabled
DEGRADED One or more of the entity’s components are in ERROR
ERROR -The entity has failed
- The member is failing it’s health monitoring checks
- All of the pool members are in ERROR
NO_MONITOR No health monitor is configured for this entity and it’s status is unknown

High availability properties

OpenStack Octavia load balancers are placed in two of our three availability zones. This is a limitation imposed by the OpenStack Octavia project.

Reconfiguring using annotations

Reconfiguring the load balancers using annotations is not as dynamic and smooth as one would hope. For now, to change the configuration of a load balancer the service needs to be deleted and a new one created.

Loadbalancer protocols

Loadbalancers have support for multiple protocols. In general we would recommend everyone to try avoiding http and https simply because they do not perform as well as other protocols.

Instead use tcp or haproxys proxy protocol and run an ingress controller thats responsible for proxying within clusters and TLS.

Load Balancer Flavors

Load balancers come in multiple flavors. The biggest difference is how much traffic they can handle. If no flavor is deployed, we default to v1-lb-1. However, this flavor can only push around 200 Mbit/s. For customers wanting to push potentially more, we have a couple of flavors to choose from:

ID Name Specs Approx Traffic
16cce6f9-9120-4199-8f0a-8a76c21a8536 v1-lb-1 1G, 1 CPU 200 Mbit/s
48ba211c-20f1-4098-9216-d28f3716a305 v1-lb-2 1G, 2 CPU 400 Mbit/s
b4a85cd7-abe0-41aa-9928-d15b69770fd4 v1-lb-4 2G, 4 CPU 800 Mbit/s
1161b39a-a947-4af4-9bda-73b341e1ef47 v1-lb-8 4G, 8 CPU 1600 Mbit/s

To select a flavor for your Load Balancer, add the following to the Kubernetes Service .metadata.annotations:

loadbalancer.openstack.org/flavor-id: <id-of-your-flavor>

Note that this is a destructive operation when modifying an existing Service; it will remove the current Load Balancer and create a new one (with a new public IP).

Full example configuration for a basic LoadBalancer service:

apiVersion: v1
kind: Service
metadata:
  annotations:
    loadbalancer.openstack.org/flavor-id: b4a85cd7-abe0-41aa-9928-d15b69770fd4
  name: my-loadbalancer
spec:
  ports:
  - name: http-80
    port: 80
    protocol: TCP
    targetPort: http
  selector:
    app: my-application
  type: LoadBalancer

9 - Persistent volumes

Using persistent volumes

Persistent volumes in our Elastx Kubernetes CaaS service are provided by OpenStack Cinder. Volumes are dynamically provisioned by Kubernetes Cloud Provider OpenStack.

Storage classes

8k refers to 8000 IOPS.

See our pricing page under the table Storage to calculate your costs.

Below is the list of storage classes provided in newly created clusters. In case you see other storageclasses in your cluster, consider these legacy and please migrate data away from them. We provide a guide to Change PV StorageClass.

$ kubectl get storageclasses
NAME              PROVISIONER                RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   AGE
v2-128k           cinder.csi.openstack.org   Delete          WaitForFirstConsumer   true                   27d
v2-16k            cinder.csi.openstack.org   Delete          WaitForFirstConsumer   true                   27d
v2-1k (default)   cinder.csi.openstack.org   Delete          WaitForFirstConsumer   true                   27d
v2-32k            cinder.csi.openstack.org   Delete          WaitForFirstConsumer   true                   27d
v2-4k             cinder.csi.openstack.org   Delete          WaitForFirstConsumer   true                   27d
v2-64k            cinder.csi.openstack.org   Delete          WaitForFirstConsumer   true                   27d
v2-8k             cinder.csi.openstack.org   Delete          WaitForFirstConsumer   true                   27d

Example of PersistentVolumeClaim

A quick example of how to create an unused 1Gi persistent volume claim named example:

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: example
spec:
  accessModes:
    - ReadWriteOnce
  volumeMode: Filesystem
  resources:
    requests:
      storage: 1Gi
  storageClassName: v2-16k
$ kubectl get persistentvolumeclaim
NAME      STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
example   Bound    pvc-f8b1dc7f-db84-11e8-bda5-fa163e3803b4   1Gi        RWO            v2-16k            18s

Good to know

Cross mounting of volumes between nodes

Cross mounting of volumes is not supported! That is a volume can only be mounted by a node residing in the same availability zone as the volume. Plan accordingly for ensured high availability!

Limit of volumes and pods per node

In case higher number of volumes or pods are required, consider adding additional worker nodes.

Kubernetes version Max pods/node Max volumes/node
v1.25 and lower 110 25
v1.26 and higher 110 125

Encryption

All volumes are encrypted at rest in hardware.

Volume type hostPath

A volume of type hostPath is in reality just a local directory on the specific node being mounted in a pod, this means data is stored locally and will be unavailable if the pod is ever rescheduled on another node. This is expected during cluster upgrades or maintenance, however it may also occur because of other reasons, for example if a pod crashes or a node is malfunctioning. Malfunctioning nodes are automatically healed, meaning they are automatically replaced.

You can read more about hostpath here.

If you are looking for a way to store persistent data we recommend to use PVCs. PVCs can move between nodes within one data-center meaning any data stored will be present even if the pod or node is recreated.

Known issues

Resizing encrypted volumes

Legacy: encrypted volumes do not resize properly, please contact our support if you wish to resize such a volume.

10 - Kubernetes API whitelist

How to limit access to the kubernetes API

In our Kubernetes Services, we rely on Openstack loadbalancers in front of the control planes to ensure traffic will be sent to a functional node. Whitelisting of access to the API server is now controlled in the loadbalancer in front of the API. Currently, managing the IP-range whitelist requires a support ticket here.

Please submit a ticket with CIDR/ranges for the ip’s you wish to whitelist. We are happy to help you ASAP.

Note: All Elastx IP ranges are always included.

In the future, we expect to have this functionality available self-service style.