1 - Announcements

Announcement for Kubernetes CaaS

2024-06-28 New default Kubernetes StorageClass

We are changing the default StorageClass from v1-dynamic-40 to v2-1k ahead of schedule.

Background
The change was already implemented by accident to clusters created on versions after v1.26, which meant we diverged from advertised changes in our changelog.

By committing to doing this change actively for all clusters, we’re catching up to reality and uniforming our clusters.

Impact
Customers who are not specifying a storage class, will not have any impact on creating new volumes. In this scenario, a volume will be created with the v2-1k as the new default. Customers that actively specify the old v1-dynamic-40 will face no impact as this StorageClass is still supported.

General note
To simplify the future necessary migration to v2 storage classes, please consider to stop creating new volumes using StorageClass that are not prefixed with “v2-”.

A list of available StorageClasses and their respective pricing can be found here: https://elastx.se/en/openstack/pricing

A guide on how to migrate volumes between StorageClasses can be found here: https://docs.elastx.cloud/docs/kubernetes/guides/change_storageclass/

2024-03-08 Kubernetes v1.26 upgrade notice

To ECP customers that have not yet upgraded to v1.26, this announcement is valid for you.

We have received, and acted upon, customer feedback regarding the v1.26 upgrade. Considering valuable feedback received from our customers, we are introducing two new options to ensure a suitable upgrade path for your cluster.

Ingress and Certmanager

  • We will not require customers to take ownership of the Ingress and Certmanager as advertised previously. We will continue to provide a managed Ingress/Certmanager.

A new cluster free of charge

  • You have the option to request a new cluster, in which you can setup your services at your own pace. You can choose the kubernetes version, we support 1.26 or 1.29 (soon 1.30). The cluster is free of charge for 30 days and after that, part of your standard environment.
  • We expect you to migrate your workloads from the old cluster to the new one, and then cancel the old cluster via a ZD ticket.

What’s next?
Our team will initiate contact via a Zendesk ticket to discuss your preferences and gather the necessary configuration options. We will initially propose a date and time for the upgrade.

Meanwhile, please have a look at our updated version of the migration guide to v1.26:
https://docs.elastx.cloud/docs/kubernetes/knowledge-base/migration-to-caasv2/.

In case you have any technical inquiries please submit a support ticket at:
https://support.elastx.se.

We are happy to help and guide you through the upgrade process.

2023-12-08 Kubernetes CaaS updates including autoscaling

We are happy to announce our new Kubernetes CaaS lifecycle management with support for both worker node auto scaling and auto healing. We have reworked a great deal of the backend for the service which will speed up changes, allow you to run clusters in a more efficient way as well as being able to handle increased load without manual intervention.

All new clusters will automatically be deployed using our new backend. Existing clusters will need to be running Kubernetes 1.25 in order to be upgraded. We plan to contact all customers during Q1 2024 in order to plan this together with the Kubernetes 1.26 upgrade.

When upgrading, there are a few changes that need immediate action. Most notably the ingress will be migrated to a load balancer setup. We have information on all changes in more detail here: https://docs.elastx.cloud/docs/kubernetes/knowledge-base/migration-to-caasv2/

You can find information, specifications and pricing here, https://elastx.se/en/kubernetes/.

Service documentation is available here, https://docs.elastx.cloud/docs/kubernetes/.

If you have any general questions or would like to sign-up please contact us at hello@elastx.se.

For any technical questions please register a support ticket at https://support.elastx.se.

2 - Overview

Elastx Kubernetes CaaS

Elastx Kubernetes CaaS consists of a fully redundant Kubernetes cluster spread over three separate physical locations (availability zones) in Stockholm, Sweden. We offer managed addons and monitoring 24x7, including support.

Overview of Elastx Kubernetes CaaS data centers

Features

Elastx Kubernetes CaaS runs on top of our high perfomant OpenStack IaaS platform and we integrate with the features it provides.

  • High availability: Cluster nodes are spread over our three availability zones, combined with our great connectivity this creates a great platform for you to build highly available services on.

  • Load Balancer: Services that use the type “LoadBalancer” in Kubernetes integrate with OpenStack Octavia. Each service exposed this way gets its own public IP (Floating IP in OpenStack lingo).

  • Persistent Storage: When creating a Persistent Volume Claim Kubernetes creates a volume using OpenStack Cinder and then connects the volume on the node where your pod(s) gets scheduled.

  • Auto scaling: Starting in CaaS 2.0 we offer node autoscaling. Autoscaling works by checking the resources your workload is requesting. Autoscaling can help you scale your clusters in case you need to run jobs or when yur application scales out due to more traffic or users than normal.

  • Standards conformant: Our clusters are certified by the CNCF Conformance Program ensuring interoperability with Cloud Native technologies and minimizing vendor lock-in.

Good to know

Design your Cloud

We expect customers to design their setup to not require access to Openstack Horizon. This is to future proof the product. This means, do not place other instances in the same Openstack project, nor utilize Swift (objectstore) in the same project. We are happy to provide a separate Swiftproject, and a secondary Openstack project for all needs.

Persistent volumes

Cross availability zone mounting of volumes is not supported. Therefore, volumes can only be mounted by nodes in the same availability zone.

Ordering and scaling

Ordering and scaling of clusters is currently a manual process involving contact with either our sales department or our support. This is a known limitation, but we are quick to respond and a cluster is typically delivered within a business day.

Since Elastx Private Kubernetes 2.0 we offer auto scaling of workload nodes. This is based on resource requests, which means it relies on the administator to set realistic requests on the workload. Configuring auto-scaling options is currently a manual process involving contact with either our sales department or our support.

Cluster add-ons

We offer a managed cert-manager and a managed NGINX Ingress Controller.

If you are interested in removing any limitations, we’ve assembled guides with everything you need to install the same IngressController and cert-manager as we provide. This will give you full control. The various resources gives configuration examples, and instructions for lifecycle management. These can be found in the sections Getting Started and Guides.

3 - Changelog

Latest changes for Elastx Kubernetes CaaS

3.1 - Changelog for Kubernetes 1.32

Changelog for Kubernetes 1.32

Versions

The deployed Kubernetes patch version varies based on when your cluster is deployed or upgraded. We strive to use the latest versions available.

Current release leverages Kubernetes 1.32. Official release blogpost found here with corresponding official changelog.

Optional addons

  • ingress-nginx is provided with version v1.12.1
  • cert-manager is provided with version v1.16.3

Major changes

  • We have announced the deprecation of legacy storageClasses in v1.32. This is postponed to v1.34.

  • Flow control flowcontrol.apiserver.k8s.io/v1beta3 will be removed. The replacement flowcontrol.apiserver.k8s.io/v1 was implemented in Kubernetes 1.29

  • More details can be found in Kubernetes official documentation.

Noteworthy changes in coming versions

V1.34

  • The 4k, 8k, 16k, and v1-dynamic-40 storage classes are scheduled to be removed. Existing volumes will not be affected. The ability to create legacy volumes will be removed. Please migrate manifests that specify storage classes to the storageclasses prefixed with v2-, which have been available since Kubernetes 1.26 and have been the default since 2024-06-28 made public in the announcement.

Is downtime expected

The cluster is expected to be up and running during the upgrade however pods will restart when being migrated to a new node.

Known issues

Custom node taints and labels lost during upgrade

All custom taints and labels on worker and control-plane nodes are lost during upgrade.

Snapshots are not working

There is currently a limitation in the snapshot controller not making it topology aware.

3.2 - Changelog for Kubernetes 1.31

Changelog for Kubernetes 1.31

Versions

The deployed Kubernetes patch version varies based on when your cluster is deployed or upgraded. We strive to use the latest versions available.

Current release leverages Kubernetes 1.31. Official release blogpost found here with corresponding official changelog.

Major changes

In case there are major changes that impacts Elastx Kubernetes cluster deployments they will be listed here.

Noteworthy API changes in coming version Kubernetes 1.32

  • Flow control flowcontrol.apiserver.k8s.io/v1beta3 will be removed. The replacement flowcontrol.apiserver.k8s.io/v1 was implemented in Kubernetes 1.29

  • (The 4k, 8k, 16k, and v1-dynamic-40 storage classes will be removed This is postponed to v1.34.). Please migrate to the v2 storage classes, which have been available since Kubernetes 1.26 and have been the default since Kubernetes 1.30.

  • More details can be found in Kubernetes official documentation.

Other noteworthy deprecations

  • Please migrate to the v2 storage classes, which have been available since Kubernetes 1.26 and have been the default since the announcement for existing clusters, and default for new clusters starting at Kubernetes v1.30.

    Kubernetes beta topology labels on nodes are deprecated and will be removed in a future release. You can follow the list below to see which labels are being replaced:

Please note: The following changes does not have a set Kubernetes release. However, the replacement labels are already implemented.

  • beta.kubernetes.io/instance-type -> node.kubernetes.io/instance-type
  • beta.kubernetes.io/arch -> kubernetes.io/arch
  • beta.kubernetes.io/os -> kubernetes.io/os
  • failure-domain.beta.kubernetes.io/region -> topology.kubernetes.io/region
  • failure-domain.beta.kubernetes.io/zone -> topology.kubernetes.io/zone

Is downtime expected

The cluster is expected to be up and running during the upgrade however pods will restart when being migrated to a new node.

Known issues

Custom node taints and labels lost during upgrade

All custom taints and labels on worker and control-plane nodes are lost during upgrade.

Snapshots are not working

There is currently a limitation in the snapshot controller not making it topology aware.

3.3 - Changelog for Kubernetes 1.30

Changelog for Kubernetes 1.30

Versions

The deployed Kubernetes version varies based on when your cluster is deployed. We try deploying cluster using the latest patch release of Kubernetes.

Current release is Kubernetes 1.30.1

Major changes

  • New default storageclass v2-1k
  • New clusters will only have v2 storage classes available.
  • nodelocaldns will be removed for all clusters where it’s still deployed. This change affects only clusters created prior to Kubernetes 1.26, as the feature was deprecated in that version.
  • Clusters created before Kubernetes 1.26 will have their public domains removed. In Kubernetes 1.26, we migrated to using a LoadBalancer and its IP instead. If you are using an old kubeconfig with an active domain, please fetch a new one.

APIs removed in Kubernetes 1.32

More details can be found in Kubernetes official documentation.

  • Flow control flowcontrol.apiserver.k8s.io/v1beta3. The replacement flowcontrol.apiserver.k8s.io/v1 was implemented in Kubernetes 1.29
  • The 4k, 8k, 16k, and v1-dynamic-40 storage classes will be removed. Please migrate to the v2 storage classes, which have been available since Kubernetes 1.26 and have been the default since Kubernetes 1.30.

Other noteworthy deprecations

Kubernetes beta topology labels on nodes are deprecated and will be removed in a future release. You can follow the list below to see which labels are being replaced:

Please note: The following changes does not have a set Kubernetes release. However, the replacement labels are already implemented.

  • beta.kubernetes.io/instance-type -> node.kubernetes.io/instance-type
  • beta.kubernetes.io/arch -> kubernetes.io/arch
  • beta.kubernetes.io/os -> kubernetes.io/os
  • failure-domain.beta.kubernetes.io/region -> topology.kubernetes.io/region
  • failure-domain.beta.kubernetes.io/zone -> topology.kubernetes.io/zone

Is downtime expected

The cluster is expected to be up and running during the upgrade however pods will restart when being migrated to a new node.

Known issues

Custom node taints and labels lost during upgrade

All custom taints and labels on worker and control-plane nodes are lost during upgrade.

Snapshots are not working

There is currently a limitation in the snapshot controller not making it topology aware.

3.4 - Changelog for Kubernetes 1.29

Changelog for Kubernetes 1.29

Versions

The deployed Kubernetes version varies based on when your cluster is deployed. We try deploying cluster using the latest patch release of Kubernetes.

Current release is Kubernetes 1.29.1

Major changes

  • Flow control flowcontrol.apiserver.k8s.io/v1beta2. The replacement flowcontrol.apiserver.k8s.io/v1beta3 was implemented in Kubernetes 1.26

APIs removed in Kubernetes 1.32

More details can be found in Kubernetes official documentation.

  • Flow control flowcontrol.apiserver.k8s.io/v1beta3. The replacement flowcontrol.apiserver.k8s.io/v1 was implemented in Kubernetes 1.29

Other noteworthy deprecations

Kubernetes beta topology labels on nodes are deprecated and will be removed in a future release. You can follow the list below to see which labels are being replaced:

Please note: The following changes does not have a set Kubernetes release. However, the replacement labels are already implemented.

  • beta.kubernetes.io/instance-type -> node.kubernetes.io/instance-type
  • beta.kubernetes.io/arch -> kubernetes.io/arch
  • beta.kubernetes.io/os -> kubernetes.io/os
  • failure-domain.beta.kubernetes.io/region -> topology.kubernetes.io/region
  • failure-domain.beta.kubernetes.io/zone -> topology.kubernetes.io/zone

Is downtime expected

The cluster is expected to be up and running during the upgrade however pods will restart when being migrated to a new node.

Known issues

Custom node taints and labels lost during upgrade

All custom taints and labels on worker and control-plane nodes are lost during upgrade.

Snapshots are not working

There is currently a limitation in the snapshot controller not making it topology aware.

3.5 - Changelog for Kubernetes 1.28

Changelog for Kubernetes 1.28

Versions

The deployed Kubernetes version varies based on when your cluster is deployed. We try deploying cluster using the latest patch release of Kubernetes.

Current release is Kubernetes 1.28.6

Major changes

  • No major changes

APIs removed in Kubernetes 1.29

More details can be found in Kubernetes official documentation.

  • Flow control flowcontrol.apiserver.k8s.io/v1beta2. The replacement flowcontrol.apiserver.k8s.io/v1beta3 was implemented in Kubernetes 1.26

Other noteworthy deprecations

Kubernetes beta topology labels on nodes are deprecated and will be removed in a future release. You can follow the list below to see which labels are being replaced:

Please note: The following changes does not have a set Kubernetes release. However, the replacement labels are already implemented.

  • beta.kubernetes.io/instance-type -> node.kubernetes.io/instance-type
  • beta.kubernetes.io/arch -> kubernetes.io/arch
  • beta.kubernetes.io/os -> kubernetes.io/os
  • failure-domain.beta.kubernetes.io/region -> topology.kubernetes.io/region
  • failure-domain.beta.kubernetes.io/zone -> topology.kubernetes.io/zone

Is downtime expected

The cluster is expected to be up and running during the upgrade however pods will restart when being migrated to a new node.

Known issues

Custom node taints and labels lost during upgrade

All custom taints and labels on worker and control-plane nodes are lost during upgrade.

Snapshots are not working

There is currently a limitation in the snapshot controller not making it topology aware.

3.6 - Changelog for Kubernetes 1.27

Changelog for Kubernetes 1.27

Versions

The deployed Kubernetes version varies based on when your cluster is deployed. We try deploying cluster using the latest patch release of Kubernetes.

Current release is Kubernetes 1.27.10

Major changes

  • Removed API CSIStorageCapacity storage.k8s.io/v1beta1. The replacement storage.k8s.io/v1 was implemented in Kubernetes 1.24

APIs removed in Kubernetes 1.29

More details can be found in Kubernetes official documentation.

  • Flow control flowcontrol.apiserver.k8s.io/v1beta2. The replacement flowcontrol.apiserver.k8s.io/v1beta3 was implemented in Kubernetes 1.26

Other noteworthy deprecations

Kubernetes beta topology labels on nodes are deprecated and will be removed in a future release. You can follow the list below to see which labels are being replaced:

Please note: The following changes does not have a set Kubernetes release. However, the replacement labels are already implemented.

  • beta.kubernetes.io/instance-type -> node.kubernetes.io/instance-type
  • beta.kubernetes.io/arch -> kubernetes.io/arch
  • beta.kubernetes.io/os -> kubernetes.io/os
  • failure-domain.beta.kubernetes.io/region -> topology.kubernetes.io/region
  • failure-domain.beta.kubernetes.io/zone -> topology.kubernetes.io/zone

Is downtime expected

The cluster is expected to be up and running during the upgrade however pods will restart when being migrated to a new node.

Known issues

Custom node taints and labels lost during upgrade

All custom taints and labels on worker and control-plane nodes are lost during upgrade.

Snapshots are not working

There is currently a limitation in the snapshot controller not making it topology aware.

3.7 - Changelog for Kubernetes 1.26

Changelog for Kubernetes 1.26

Versions

The deployed Kubernetes version varies based on when your cluster is deployed. We try deploying cluster using the latest patch release of Kubernetes.

Current release is Kubernetes 1.26.13

Major changes

  • Added support for node autoscaling
  • Removed API Flow control resources flowcontrol.apiserver.k8s.io/v1beta1. The replacement flowcontrol.apiserver.k8s.io/v1beta2 was implemented in Kubernetes 1.23
  • Removed API HorizontalPodAutoscaler autoscaling/v2beta2. The replacement autoscaling/v2 was introduced in Kubernetes 1.23
  • We no longer deploy NodeLocal DNSCache for new clusters

Deprecations

Note that all deprecations will be removed in a future Kubernetes release. This does not mean you need to perform any changes right now. However, we recommend you to start migrating your applications in order to avoid issues in future releases.

  • In Kubernetes 1.26 the storage class 4k will be removed from all clusters. This only affects clusters created prior to Kubernetes 1.23. Instead use the v1-dynamic-40 which is the default storage class since Kubernetes 1.23. This change was originally planned for Kuberntes 1.25 however has been pushed back to 1.26 to allow some extra time for migrations.

APIs removed in Kubernetes 1.27

More details can be found in Kubernetes official documentation.

  • CSIStorageCapacity storage.k8s.io/v1beta1. The replacement storage.k8s.io/v1 was implemented in Kubernetes 1.24

APIs removed in Kubernetes 1.29

More details can be found in Kubernetes official documentation.

  • Flow control flowcontrol.apiserver.k8s.io/v1beta2. The replacement flowcontrol.apiserver.k8s.io/v1beta3 was implemented in Kubernetes 1.26

Other noteworthy deprecations

Kubernetes beta topology labels on nodes are deprecated and will be removed in a future release. You can follow the list below to see which labels are being replaced:

Please note: The following changes does not have a set Kubernetes release. The replacement labels are however already implemented.

  • beta.kubernetes.io/instance-type -> node.kubernetes.io/instance-type
  • beta.kubernetes.io/arch -> kubernetes.io/arch
  • beta.kubernetes.io/os -> kubernetes.io/os
  • failure-domain.beta.kubernetes.io/region -> topology.kubernetes.io/region
  • failure-domain.beta.kubernetes.io/zone -> topology.kubernetes.io/zone

Is downtime expected

The cluster is expected to be up and running during the upgrade however pods will restart when being migrated to a new node.

Known issues

Custom node taints and labels lost during upgrade

All custom taints and labels on worker and control-plane nodes are lost during upgrade.

Snapshots are not working

There is currently a limitation in the snapshot controller not making it topology aware.

3.8 - Changelog for Kubernetes 1.25

Changelog for Kubernetes 1.25

Versions

  • Kubernetes 1.25.6
  • Nginx-ingress: 1.4.0
  • Certmanager: 1.11.0

Major changes

  • Pod Security Policies has been removed.
  • CronJob API batch/v1beta1 has been removed and is replaced with batch/v1 that was implemented in Kubernetes 1.21
  • EndpointSlice API discovery.k8s.io/v1beta1 has been removed and is replaced with discovery.k8s.io/v1 that was implemented in Kubernetes 1.21
  • Event API events.k8s.io/v1beta1 has been removed and is replaced with events.k8s.io/v1 that was implemented in Kubernetes 1.19
  • PodDisruptionBudget API policy/v1beta1 has been removed and is replaced with policy/v1 that was implemented in Kubernetes 1.21
  • RuntimeClass API node.k8s.io/v1beta1 has been removed and is replaced with node.k8s.io/v1 that was implemented in Kubernetes 1.20

Deprecations

Note that all deprecations will be removed in a future Kubernetes release. This does not mean you need to perform any changes right now. However, we recommend you to start migrating your applications in order to avoid issues in future releases.

  • In Kubernetes 1.26 the storage class 4k will be removed from all clusters. This only affects clusters created prior to Kubernetes 1.23. Instead use the v1-dynamic-40 which is the default storage class since Kubernetes 1.23. This change was originally planned for Kuberntes 1.25 however has been pushed back to 1.26 to allow some extra time for migrations.

APIs removed in Kubernetes 1.26

More details can be found in Kubernetes official documentation.

  • Flow control resources flowcontrol.apiserver.k8s.io/v1beta1. The replacement flowcontrol.apiserver.k8s.io/v1beta2 was implemented in Kubernetes 1.23
  • HorizontalPodAutoscaler autoscaling/v2beta2. The replacement autoscaling/v2 was introduced in Kubernetes 1.23

APIs removed in Kubernetes 1.27

More details can be found in Kubernetes official documentation.

  • CSIStorageCapacity storage.k8s.io/v1beta1. The replacement storage.k8s.io/v1 was implemented in Kubernetes 1.24

Other noteworthy deprecations

Kubernetes beta topology labels on nodes are deprecated and will be removed in a future release. You can follow the list below to see which labels are being replaced:

Please note: The following changes does not have a set Kubernetes release. However, the replacement labels are already implemented.

  • beta.kubernetes.io/instance-type -> node.kubernetes.io/instance-type
  • beta.kubernetes.io/arch -> kubernetes.io/arch
  • beta.kubernetes.io/os -> kubernetes.io/os
  • failure-domain.beta.kubernetes.io/region -> topology.kubernetes.io/region
  • failure-domain.beta.kubernetes.io/zone -> topology.kubernetes.io/zone

Is downtime expected

The upgrade drains (move all workload from) one node at the time, patches that node and brings it back in the cluster. First after all deployments and statefulsets are running again we will continue on with the next node.

Known issues

Custom node taints and labels lost during upgrade

All custom taints and labels on worker and control-plane nodes are lost during upgrade.

Custom changes to non -customer security groups will be lost

All changes to security groups not suffixed with “-customer” will be lost during the upgrade.

Snapshots are not working

There is currently a limitation in the snapshot controller not making it topology aware.

3.9 - Changelog for Kubernetes 1.24

Changelog for Kubernetes 1.24

Versions

  • Kubernetes 1.24.6
  • Nginx-ingress: 1.4.0
  • Certmanager: 1.10.0

Major changes

  • The node-role.kubernetes.io/master= label is removed from all control plane nodes, instead use the node-role.kubernetes.io/control-plane= label.
  • The taint node-role.kubernetes.io/control-plane:NoSchedule has been added to all control plane nodes.

Deprecations

Note that all deprecations will be removed in a future Kubernetes release. This does not mean you need to perform any changes right now. However, we recommend you to start migrating your applications in order to avoid issues in future releases.

  • In Kubernetes 1.25 the storage class 4k will be removed from all clusters. This only affects clusters created prior to Kubernetes 1.23. Instead use the v1-dynamic-40 which is the default storage class since Kubernetes 1.23.

APIs removed in Kubernetes 1.25

More details can be found in Kubernetes official documentation.

  • Pod Security Policies will be removed in Kubernetes 1.25
  • CronJob batch/v1beta1. The new API batch/v1 was implemented in Kubernetes 1.21 (this is a drop in replacement)
  • EndpointSlice discovery.k8s.io/v1beta1. The new API discovery.k8s.io/v1 was implemented in Kubernetes 1.21
  • Event events.k8s.io/v1beta1. The new API events.k8s.io/v1 was implemented in Kubernetes 1.19
  • PodDisruptionBudget policy/v1beta1. The new API policy/v1 was implemented in Kubernetes 1.21
  • RuntimeClass node.k8s.io/v1beta1. The new API node.k8s.io/v1 was implemented in Kubernetes 1.20

APIs removed in Kubernetes 1.26

More details can be found in Kubernetes official documentation.

  • Flow control resources flowcontrol.apiserver.k8s.io/v1beta1. The replacement flowcontrol.apiserver.k8s.io/v1beta2 was implemented in Kubernetes 1.23
  • HorizontalPodAutoscaler autoscaling/v2beta2. The replacement autoscaling/v2 was introduced in Kubernetes 1.23

Other noteworthy deprecations

Kubernetes beta topology labels on nodes are deprecated and will be removed in a future release. You can follow the list below to see which labels are being replaced:

Please note: The following changes does not have a set Kubernetes release. However, the replacement labels are already implemented.

  • beta.kubernetes.io/instance-type -> node.kubernetes.io/instance-type
  • beta.kubernetes.io/arch -> kubernetes.io/arch
  • beta.kubernetes.io/os -> kubernetes.io/os
  • failure-domain.beta.kubernetes.io/region -> topology.kubernetes.io/region
  • failure-domain.beta.kubernetes.io/zone -> topology.kubernetes.io/zone

Is downtime expected

The upgrade drains (move all workload from) one node at the time, patches that node and brings it back in the cluster. First after all deployments and statefulsets are running again we will continue on with the next node.

Known issues

Custom node taints and labels lost during upgrade

All custom taints and labels on worker and control-plane nodes are lost during upgrade.

Custom changes to non -customer security groups will be lost

All changes to security groups not suffixed with “-customer” will be lost during the upgrade.

Snapshots are not working

There is currently a limitation in the snapshot controller not making it topology aware.

3.10 - Changelog for Kubernetes 1.23

Changelog for Kubernetes 1.23

Versions

  • Kubernetes 1.23.7
  • Nginx-ingress: 1.3.0
  • Certmanager: 1.9.1

Major changes

  • A new storage class v1-dynamic-40 is introduced and set as the default storage class. All information about this storage class can be found here.
  • Worker and control plane nodes now use v1-c2-m8-d80 as their default flavor. You can find a complete list of all available flavors here.
  • All nodes will be migrated to the updated flavors during the upgrade. All new flavors will have the same specification however the flavor ID will be changed. This affects customers that use the node.kubernetes.io/instance-type label that can be located on nodes.
  • Control plane nodes will have their disk migrated from the deprecated 4k storage class to v1-dynamic-40.
  • Starting from Kubernetes 1.23 we will require 3 control plane (masters) nodes.

Flavor mapping

Old flavor New flavor
v1-standard-2 v1-c2-m8-d80
v1-standard-4 v1-c4-m16-d160
v1-standard-8 v1-c8-m32-d320
v1-dedicated-8 d1-c8-m58-d800
v2-dedicated-8 d2-c8-m120-d1.6k

Changes affecting new clusters:

What happened to the metrics/monitoring node?

Previously when creating new or upgrading clusters to Kubernetes 1.23 we added an extra node that handled monitoring. This node is no longer needed and all services have been converted to run inside the Kubernetes cluster. This means that clusters being upgraded or created from now on won’t get an extra node added. Clusters that currently have the monitoring node will be migrated to the new setup within the upcoming weeks (The change is non-service affecting).

Deprecations

Note that all deprecations will be removed in a future Kubernetes release. This does not mean you need to perform any changes right now. However, we recommend you to start migrating your applications in order to avoid issues in future releases.

  • In kubernetes 1.25 the storage class 4k will be removed from all clusters created prior to Kubernetes 1.23.

APIs removed in Kubernetes 1.25

More details can be found in Kubernetes official documentation.

  • Pod Security Policies will be removed in Kubernetes 1.25
  • CronJob batch/v1beta1. The new API batch/v1 was implemented in Kubernetes 1.21 (this is a drop in replacement)
  • EndpointSlice discovery.k8s.io/v1beta1. The new API discovery.k8s.io/v1 was implemented in Kubernetes 1.21
  • Event events.k8s.io/v1beta1. The new API events.k8s.io/v1 was implemented in Kubernetes 1.19
  • PodDisruptionBudget policy/v1beta1. The new API policy/v1 was implemented in Kubernetes 1.21
  • RuntimeClass node.k8s.io/v1beta1. The new API node.k8s.io/v1 was implemented in Kubernetes 1.20

Other noteworthy deprecations

Kubernetes beta topology labels on nodes are deprecated and will be removed in a future release. You can follow the list below to see which labels are being replaced:

Please note: The following changes does not have a set Kubernetes release. However, the replacement labels are already implemented.

  • beta.kubernetes.io/instance-type -> node.kubernetes.io/instance-type
  • beta.kubernetes.io/arch -> kubernetes.io/arch
  • beta.kubernetes.io/os -> kubernetes.io/os
  • failure-domain.beta.kubernetes.io/region -> topology.kubernetes.io/region
  • failure-domain.beta.kubernetes.io/zone -> topology.kubernetes.io/zone

Is downtime expected

The upgrade drains (move all workload from) one node at the time, patches that node and brings it back in the cluster. First after all deployments and statefulsets are running again we will continue on with the next node.

Known issues

Custom node taints and labels lost during upgrade

All custom taints and labels on worker and control-plane nodes are lost during upgrade.

Custom changes to non -customer security groups will be lost

All changes to security groups not suffixed with “-customer” will be lost during the upgrade.

Snapshots are not working

There is currently a limitation in the snapshot controller not making it topology aware.

3.11 - Changelog for Kubernetes 1.22

Changelog for Kubernetes 1.22

Versions

  • Kubernetes 1.22.8
  • Nginx-ingress: 1.1.1
  • Certmanager: 1.6.3

Major changes

  • When our ingress is installed we set it as the default ingress, meaning it will be used unless a custom ingress class is used/specified
  • Clusters are now running containerd instead of docker. This should not affect your workload at all
  • We reserve 5% RAM on all nodes making it easier to calculate how much is left for your workload
  • All components deployed by Elastx have tolerations for NoSchedule taints by default
  • Certmanager cert-manager.io/v1alpha2, cert-manager.io/v1alpha3, cert-manager.io/v1beta1, acme.cert-manager.io/v1alpha2, acme.cert-manager.io/v1alpha3 and acme.cert-manager.io/v1beta1 APIs are no longer served. All existing resources will be converted automatically to cert-manager.io/v1 and acme.cert-manager.io/v1, however you will still need to update your local manifests
  • Several old APIs are no longer served. A complete list can be found in Kubernetes documentation

Changes affecting new clusters:

  • All new clusters will have the cluster domain cluster.local by default
  • The encrypted *-enc storage-classes (4k-enc, 8k-enc and 16k-enc) are no longer available to new clusters since they are deprecated for removal in Openstack. Do not worry, all our other storage classes (4k, 8k, 16k and future classes) are now encrypted by default. Read our full announcement here

Deprecations

Note that all deprecations will be removed in a future Kubernetes release. This does not mean you need to perform any changes right now. However, we recommend you to start migrating your applications in order to avoid issues in future releases.

APIs removed in Kubernetes 1.25

More details can be found in Kubernetes official documentation.

  • Pod Security Policies will be removed in Kubernetes 1.25
  • CronJob batch/v1beta1. The new API batch/v1 was implemented in Kubernetes 1.21 (this is a drop in replacement)
  • EndpointSlice discovery.k8s.io/v1beta1. The new API discovery.k8s.io/v1 was implemented in Kubernetes 1.21
  • Event events.k8s.io/v1beta1. The new API events.k8s.io/v1 was implemented in Kubernetes 1.19
  • PodDisruptionBudget policy/v1beta1. The new API policy/v1 was implemented in Kubernetes 1.21
  • RuntimeClass node.k8s.io/v1beta1. The new API node.k8s.io/v1 was implemented in Kubernetes 1.20

Other noteworthy deprecations

Kubernetes beta topology labels on nodes are deprecated and will be removed in a future release. You can follow the list below to see which labels are being replaced:

Please note: The following changes does not have a set Kubernetes release. However, the replacement labels are already implemented.

  • beta.kubernetes.io/instance-type -> node.kubernetes.io/instance-type
  • beta.kubernetes.io/arch -> kubernetes.io/arch
  • beta.kubernetes.io/os -> kubernetes.io/os
  • failure-domain.beta.kubernetes.io/region -> topology.kubernetes.io/region
  • failure-domain.beta.kubernetes.io/zone -> topology.kubernetes.io/zone

Is downtime expected

The upgrade drains (move all workload from) one node at the time, patches that node and brings it back in the cluster. First after all deployments and statefulsets are running again we will continue on with the next node.

Known issues

Custom node taints and labels lost during upgrade

All custom taints and labels on nodes are lost during upgrade.

Custom changes to non -customer security groups will be lost

All changes to security groups not suffixed with “-customer” will be lost during the upgrade.

Snapshots are not working

There is currently a limitation in the snapshot controller not making it topology aware.

3.12 - Changelog for Kubernetes 1.21

Changelog for Kubernetes 1.21

Versions

  • Kubernetes 1.21.5
  • Nginx-ingress: 1.0.1
  • Certmanager: 1.5.3

Major changes

  • Load Balancers are by default allowed to talk to all tcp ports on worker nodes.

New Kubernetes features:

  • The ability to create immutable secrets and configmaps.
  • Cronjobs are now stable and the new API batch/v1 is implemented.
  • Graceful node shutdown, when shutting worker nodes this is detected by Kubernetes and pods will be evicted.

Deprecations

Note that all deprecations will be removed in a future Kubernetes release, this does not mean you need to perform any changes now however we recommend you to start migrating your applications to avoid issues in future releases.

APIs removed in Kubernetes 1.22

A guide on how to migrate from affected APIs can be found in the Kubernetes upstream documentation.

  • Ingress extensions/v1beta1 and networking.k8s.io/v1beta1
  • ValidatingWebhookConfiguration and MutatingWebhookConfiguration admissionregistration.k8s.io/v1beta1
  • CustomResourceDefinition apiextensions.k8s.io/v1beta1
  • CertificateSigningRequest certificates.k8s.io/v1beta1
  • APIService apiregistration.k8s.io/v1beta1
  • TokenReview authentication.k8s.io/v1beta1
  • Lease coordination.k8s.io/v1beta1
  • SubjectAccessReview, LocalSubjectAccessReview and SelfSubjectAccessReview authorization.k8s.io/v1beta1
  • Certmanager api v1alpha2, v1alpha3 and v1beta1

Other noteworthy deprecations

Kubernetes beta topology labels on nodes are deprecated and will be removed in a future release, follow the list below to see what labels are being replaced:

Please note: the following change does not have a set Kubernetes release when being removed however the replacement labels are already implemented.

  • beta.kubernetes.io/instance-type -> node.kubernetes.io/instance-type
  • beta.kubernetes.io/arch -> kubernetes.io/arch
  • beta.kubernetes.io/os -> kubernetes.io/os
  • failure-domain.beta.kubernetes.io/region -> topology.kubernetes.io/region
  • failure-domain.beta.kubernetes.io/zone -> topology.kubernetes.io/zone

APIs removed in Kubernetes 1.25

more detail can be found in Kubernetes official documentation.

  • Pod Security Policies will be removed in Kubernetes 1.25.
  • CronJob batch/v1beta1, the new API batch/v1 was implemented in Kubernetes 1.21 (this is a drop in replacement)
  • EndpointSlice discovery.k8s.io/v1beta1, the new API discovery.k8s.io/v1 was implemented in Kubernetes 1.21
  • Event events.k8s.io/v1beta1, the new API events.k8s.io/v1 was implemented in Kubernetes 1.19
  • PodDisruptionBudget policy/v1beta1, the new API policy/v1 was implemented in Kubernetes 1.21
  • RuntimeClass node.k8s.io/v1beta1, the new API node.k8s.io/v1 was implemented in Kubernetes 1.20

Is downtime expected

The upgrade drains (moving all workload from) one node at the time, patches that node and brings it back in the cluster. First after all deployments and statefulsets are running again we will continue on with the next node.

Known issues

Custom node taints and labels lost during upgrade

All custom taints and labels on nodes are lost during upgrade.

Custom changes to non -customer security groups will be lost

All changes to security groups not suffixed with “-customer” will be lost during the upgrade

Snapshots is not working

There is currently a limitation in the snapshot controller not making it topology aware.

3.13 - Changelog for Kubernetes 1.20

Changelog for Kubernetes 1.20

Versions

  • Kubernetes 1.20.7
  • Nginx-ingress: 0.46.0
  • Certmanager: 1.3.1

Major changes

  • RBAC api rbac.authorization.k8s.io/v1alpha1 has been removed. Instead use the replacement rbac.authorization.k8s.io/v1.
  • We no longer supports new clusters being created with pod security policy enabled. Instead we recommend using OPA Gatekeeper, in case you have any questions regarding this contact our support and we will help you guys out.
  • The built-in Cinder Volume Provider has gone from deprecated to disabled. Any volumes that are still using it will have to be migrated, see Known Issues.

Deprecations

  • Ingress api extensions/v1beta1 will be removed in kubernetes 1.22.
  • Kubernetes beta lables on nodes are deplricated and will be removed in a future release, follow the below list to se what lable replaces the old one:
    • beta.kubernetes.io/instance-type -> node.kubernetes.io/instance-type
    • beta.kubernetes.io/arch -> kubernetes.io/arch
    • beta.kubernetes.io/os -> kubernetes.io/os
    • failure-domain.beta.kubernetes.io/region -> topology.kubernetes.io/region
    • failure-domain.beta.kubernetes.io/zone -> topology.kubernetes.io/zone
  • Certmanager api v1alpha2, v1alpha3 and v1beta1 will be removed in a future release. We strongly recommend that you upgrade to the new v1 api.
  • RBAC api rbac.authorization.k8s.io/v1beta1 will be removed in an upcoming release. The apis are replaced with rbac.authorization.k8s.io/v1.
  • Pod Security Policies will be removed in Kubernetes 1.25 in all clusters having the feature enabled. Instead we recommend OPA Gatekeeper.

Is downtime expected

The upgrade drains (moving all workload from) one node at the time, patches that node and brings it back in the cluster. First after all deployments and statefulsets are running again we will continue on with the next node.

Known issues

Custom node taints and labels lost during upgrade

All custom taints and labels on nodes are lost during upgrade.

Custom changes to non -customer security groups will be lost

All changes to security groups not suffixed with “-customer” will be lost during the upgrade

Snapshots is not working

There is currently a limitation in the snapshot controller not making it topology aware.

Volumes using built-in Cinder Volume Provider will be converted

During the upgrade to 1.20 Elastx staff will upgrade any volumes still being managed by the built-in Cinder Volume Provider. No action is needed on the customer side, but it will produce events and possibly log events that may raise concern.

To get a list of Persistent Volumes that are affected you can run this command before the upgrade:

$ kubectl get pv -o json | jq -r '.items[] | select (.spec.cinder != null) | .metadata.name'

Volumes that have been converted will show an event under the Persistent Volume Claim object asserting that data has been lost - this is a false statement and is due to the fact that the underlying Persistent Volume was disconnected for a brief moment while it was being attached to the new CSI-based Cinder Volume Provider.

Bitnami (and possibly other) images and runAsGroup

Some Bitnami images silently assume they are run with the equivalent of runAsGroup: 0. This was the Kubernetes default until 1.20.x. The result is strange looking permission errors on startup and can cause workloads to fail.

At least the Bitnami PostgreSQL and RabbitMQ images have been confirmed as having these issues.

To find out if there are problematic workloads in your cluster you can run the following commands:

    kubectl get pods -A -o yaml|grep image:| sort | uniq | grep bitnami

If any images turn up there may be issues. !NB. Other images may have been built using Bitnami images as base, these will not show up using the above command.

Solution without PSP

On clusters not running PSP it should suffice to just add:

    runAsGroup: 0

To the securityContext for the affected containers.

Solution with PSP

On clusters running PSP some more actions need to be taken. The restricted PSP forbids running as group 0 so a new one needs to be created, such as:

apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
  annotations:
    apparmor.security.beta.kubernetes.io/allowedProfileNames: runtime/default
    apparmor.security.beta.kubernetes.io/defaultProfileName: runtime/default
    seccomp.security.alpha.kubernetes.io/allowedProfileNames: docker/default,runtime/default
    seccomp.security.alpha.kubernetes.io/defaultProfileName: runtime/default
  labels:
    addonmanager.kubernetes.io/mode: Reconcile
  name: restricted-runasgroup0
spec:
  allowPrivilegeEscalation: false
  fsGroup:
    ranges:
    - max: 65535
      min: 1
    rule: MustRunAs
  requiredDropCapabilities:
  - ALL
  runAsGroup:
    ranges:
    - max: 65535
      min: 0
    rule: MustRunAs
  runAsUser:
    rule: MustRunAsNonRoot
  seLinux:
    rule: RunAsAny
  supplementalGroups:
    ranges:
    - max: 65535
      min: 1
    rule: MustRunAs
  volumes:
  - configMap
  - emptyDir
  - projected
  - secret
  - downwardAPI
  - persistentVolumeClaim

Furthermore a ClusterRole allowing the use of said PSP is needed:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  labels:
    addonmanager.kubernetes.io/mode: Reconcile
  name: psp:restricted-runasgroup0
rules:
- apiGroups:
  - policy
  resourceNames:
  - restricted-runasgroup0
  resources:
  - podsecuritypolicies
  verbs:
  - use

And finally you need to bind the ServiceAccounts that need to run as group 0 to the ClusterRole with a ClusterRoleBinding:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: psp:restricted-runasgroup0
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: psp:restricted-runasgroup0
subjects:
- kind: ServiceAccount
  name: default
  namespace: keycloak
- kind: ServiceAccount
  name: XXX
  namespace: YYY

Then its just a matter of adding:

runAsGroup: 0

To the securityContext for the affected containers.

3.14 - Changelog for Kubernetes 1.19

Changelog for Kubernetes 1.19

Versions

  • Kubernetes 1.19.7
  • Nginx-ingress: 0.43.0
  • Certmanager: 1.2.0

Major changes

  • New security groups are implemented where you can store all youre firewall rules. The new security groups will be persistent between upgrades and called CLUSTERNAME-k8s-worker-customer and CLUSTERNAME-k8s-master-customer (CLUSTERNAME will be replaced with actual cluster name). With this change we will remove our previous default firewall rules that allowed public traffic to the Kubernetes cluster, this includes the following services:

    • Master API (port 6443)
    • Ingress (port 80 & 443)
    • Nodeports (ports 30000 to 32676)

    If you currently have any of the mentioned ports open you either need to add them to the new security groups (created during the upgrade) or mention this during the planning discussion and we will assist you with this. Please be aware that any rules added to the new security groups is not managed by us and you are responsible for them being up to date.

Deprecations

  • Ingress api extensions/v1beta1 will be removed in kubernetes 1.22
  • RBAC api rbac.authorization.k8s.io/v1alpha1 and rbac.authorization.k8s.io/v1beta1 will be removed in kubernetes 1.20. The apis are replaced with rbac.authorization.k8s.io/v1.
  • The node label beta.kubernetes.io/instance-type will be rmeoved in an uppcomig release. Use node.kubernetes.io/instance-type instead.
  • Certmanager api v1alpha2, v1alpha3 and v1beta1 will be removed in a future release. We strongly recommend that you upgrade to the new v1 api

Is downtime expected

The upgrade drains (moving all workload from) one node at the time, patches that node and brings in back in the cluster. First after all deployments and statefulsets are running again we will continue on with the next node.

Known issues

Custom node taints and labels lost during upgrade

All custom taints and labels on nodes are lost during upgrade.

Custom security groups will be lost during upgrade

All custom security groups bound inside openstack will be detached during upgrade.

Snapshots is not working

There is currently a limitation in the snapshot controller not making it topology aware.

3.15 - Changelog for Kubernetes 1.18

Changelog for Kubernetes 1.18

Versions

  • Kubernetes 1.18.9
  • Nginx-ingress: 0.40.0
  • Certmanager: 1.0.3

Major changes

  • Moved the tcp-services configmap used by our ingress controller to the default namespace.

Deprecations

  • Ingress api extensions/v1beta1 will be removed in kubernetes 1.22
  • RBAC api rbac.authorization.k8s.io/v1alpha1 and rbac.authorization.k8s.io/v1beta1 will be removed in kubernetes 1.20. The apis are replaced with rbac.authorization.k8s.io/v1.
  • The node label beta.kubernetes.io/instance-type will be rmeoved in an uppcomig release. Use node.kubernetes.io/instance-type instead.
  • Certmanager api v1alpha2, v1alpha3 and v1beta1 will be removed in a future release. We strongly recommend that you upgrade to the new v1 api
  • Accessing the Kubernetes dashboard over the Kubernetes API. This feature will not be added to new clusters however if your cluster already has this available it will continue working until Kubernetes 1.19

Removals

  • Some older deprecated metrics, more information regarding this can be found in the official Kubernetes changelog: Link to Kubernetes changelog

Is downtime expected

For this upgrade we expect a shorter downtime on the ingress. The downtime on the ingress should be no longer than 5 minutes and hopefully even under 1 minute in length.

The upgrade drains (moving all workload from) one node at the time, patches that node and brings in back in the cluster. First after all deployments and statefulsets are running again we will continue on with the next node.

Known issues

Custom node taints and labels lost during upgrade

All custom taints and labels on nodes are lost during upgrade.

Snapshots is not working

There is currently a limitation in the snapshot controller not making it topology aware.

Resize problem on volumes created before Kubernetes 1.16

Volume expansion sometimes fails on volumes created before Kubernetes 1.16.

A workaround exists by adding an annotation on the affected volumes, an example command:

kubectl annotate --overwrite pvc PVCNAME volume.kubernetes.io/storage-resizer=cinder.csi.openstack.org

3.16 - Changelog for Kubernetes 1.17

Changelog for Kubernetes 1.17

Versions

  • Kubernetes 1.17.9
  • Nginx-ingress: 0.32.0
  • Certmanager: 0.15.0

Major changes

  • We can now combine nodes with multiple different flavors within one cluster
  • Fixed a bug where some external network connections got stuck (MTU missmatch, calico)
  • Enabled calicos metric endpoint
  • New and improved monitoring system
  • Ingress does only support serving http over port 80 and https over port 443
  • Cert-manager using new APIs: Cert-manager info

Deprecations

  • Ingress api extensions/v1beta1 will be removed in kubernetes 1.22
  • RBAC api rbac.authorization.k8s.io/v1alpha1 and rbac.authorization.k8s.io/v1beta1 will be removed in kubernetes 1.20. The apis are replaced with rbac.authorization.k8s.io/v1.
  • The node label beta.kubernetes.io/instance-type will be rmeoved in an uppcomig release. Use node.kubernetes.io/instance-type instead.

Removals

Custom ingress ports

We no longer supports using custom ingress ports. From 1.17 http traffic will be received on port 80 and https on port 443

You can check what ports you are using with the following command:

kubectl get service -n elx-nginx-ingress elx-nginx-ingress-controller

If you aren’t using port 80 and 443 please be aware that the ports your ingress listen on will change during the upgrade to Kubernetes 1.17. ELASTX team will contact you before the upgrade takes place and we can together come up with a solution.

Old Kubernetes APIs

A complete list of APIs that will be removed in this version:

  • NetworkPolicy
    • extensions/v1beta1
  • PodSecurityPolicy
    • extensions/v1beta1
  • DaemonSet
    • extensions/v1beta1
    • apps/v1beta2
  • Deployment
    • extensions/v1beta1
    • apps/v1beta1
    • apps/v1beta2
  • StatefulSet
    • apps/v1beta1
    • apps/v1beta2
  • ReplicaSet
    • extensions/v1beta1
    • apps/v1beta1
    • apps/v1beta2

Is downtime expected

For this upgrade we expect a shorter downtime on the ingress. the downtime on the ingress should be no longer than 5 minutes and hopefully even under 1 minute in length.

The upgrade are draining (moving all load from) one node at the time, patches that node and brings in back in the cluster. first after all deployments and statefulsets are running again we will continue with the next node.

Known issues

Custom node taints and labels lost during upgrade

All custom taints and labels on nodes are lost during upgrade.

Snapshots is not working

There is currently a limitation in the snapshot controller not making it topology aware.

Resize problem on volumes created before Kubernetes 1.16

Volume expansion sometimes fails on volumes created before Kubernetes 1.16.

A workaround exists by adding an annotation on the affected volumes, an example command:

kubectl annotate --overwrite pvc PVCNAME volume.kubernetes.io/storage-resizer=cinder.csi.openstack.org

4 - Getting started

Getting started with Elastx Kubernetes CaaS

4.1 - Cluster configuration

Cluster configuration and optional features

There are a lot of options possible for your cluster. Most options have a sane default however could be overridden on request.

A default cluster comes with 3 controlplane and 3 worker nodes. To connect all nodes we create a network, default (10.128.0.0/22). We also deploy monitoring to ensure functionality of all cluster components. However most things are just a default and could be overridden.

Common options

Nodes

The standard configuration consists of the following:

  • Three control plane nodes, one in each of our availability zones. Flavor: v2-c2-m8-d80
  • Three worker nodes, one in each of our availability zones, in a single nodegroup. Flavor: v2-c2-m8-d80

Minimal configuration

  • Three control plane nodes, one in each of our availability zones. Flavor: v2-c2-m8-d80

  • One worker node, Flavor: v2-c2-m8-d80

    This is the minimal configuration offered. Scaling to larger flavors and adding nodes are supported. Autoscaling is not supported with a single worker node.

    Note: SLA is different for minimal configuration type of cluster. SLA’s can be found here.

Nodegroups and multiple flavors

To try keep node management as easy as possible we make use of nodegroups. A nodegroup contains of one or multiple nodes with one flavor and a list of availability zones to deploy nodes in. Clusters are default deliverd with a nodegroup called workers containing 3 nodes one in each AZ. A nodegroup is limited to one flavor meaning all nodes in the nodegroup will have the same amount of CPU, RAM and disk.

You could have multiple nodegroups, if you for example want to target workload on separate nodes or in case you wish to consume multiple flavors.

A few examples of nodegroups:

Name Flavour AZ list Min node count Max node count (autoscaling)
worker v2-c2-m8-d80 STO1, STO2, STO3 3 0
database d2-c8-m120-d1.6k STO1, STO2, STO3 3 0
frontend v2-c4-m16-d160 STO1, STO2, STO3 3 12
jobs v2-c4-m16-d160 STO1 1 3

In the examples we could see worker our default nodegroup and an example of having separate nodes for databases and frontend where the database is running on dedicated nodes and the frontend is running on smaller nodes but can autoscale between 3 and 12 nodes based on current cluster request. We also have a jobs nodegroup where we have one node in sto1 but can scale up to 3 nodes where all are placed inside STO1. You can read more about autoscaling here.

Nodegroups can be changed at any time. Please also note that we have auto-healing meaning in case any of your nodes for any reason stops working we will replace them. More about autohealing could be found here

Network

By default we create a cluster network (10.128.0.0/22). However we could use another subnet per customer request. The most common scenario is when customer request another subnet is when exposing multiple Kubernetes clusters over a VPN.

Please make sure to inform us if you wish to use a custom subnet during the ordering process since we cannot replace the network after creation, meaning we would then need to recreate your entire cluster.

We currently only support cidr in the 10.0.0.0/8 subnet range and at least a /24. Both nodes and loadbalancers are using IPs for this range meaning you need to have a sizable network from the beginning.

Cluster domain

We default all clusters to “cluster.local”. This is simular to most other providers. If you wish to have another cluster domain please let us know during the ordering procedure since it cannot be replaced after cluster creation.

Worker nodes Floating IPs

By default, our clusters come with nodes that do not have any Floating IPs attached to them. If, for any reason, you require Floating IPs on your workload nodes, please inform us, and we can configure your cluster accordingly. It’s worth noting that the most common use case for Floating IPs is to ensure predictable source IPs. However, please note that enabling or disabling Floating IPs will necessitate the recreation of all your nodes, one by one but could be enabled or disabled at any time.

Since during upgrades we create a new node prior to removing an old node you would need to have an additional IP adress on standby. If you wish we to preallocate a list or range of IP adresses just mention this and we will configure your cluster accordingly.

Please know that only worker nodes are consume IP adresses meaning controlplane nodes does not make use of Floating IPs.

Less common options

OIDC

If you wish to integrate with your existing OIDC compatible IDP, example Microsoft AD And Google Workspace that is supported directy in the kubernetes api service.

By default we ship clusters with this option disabled however if you wish to make use of OIDC just let us know when order the cluster or afterwards. OIDC can be enabled, disabled or changed at any time.

Cluster add-ons

We currently offer managed cert-manager, NGINX Ingress and elx-nodegroup-controller.

Cert-manager

Cert-manager (link to cert-manager.io) helps you to manage TLS certificates. A common use case is to use lets-encrypt to “automatically” generate certificates for web apps. However the functionality goes much deeper. We also have usage instructions and have a guide if you wish to deploy cert-manager yourself.

Ingress

An ingress controller in a Kubernetes cluster manages how external traffic reaches your services. It routes requests based on rules, handles load balancing, and can integrate with cert-manager to manage TLS certificates. This simplifies traffic handling and improves scalability and security compared to exposing each service individually. We have a usage guide with examples that can be found here.

We have chosen to use ingress-nginx and to support ingress, we limit what custom configurations can be made per cluster. We offer two “modes”. One that we call direct mode, which is the default behavior. This mode is used when end-clients connect directly to your ingress. We also have a proxy mode for when a proxy (e.g., WAF) is used in front of your ingress. When running in proxy mode, we also have the ability to limit traffic from specific IP addresses, which we recommend doing for security reasons. If you are unsure which mode to use or how to handle IP whitelisting, just let us know and we will help you choose the best options for your use case.

If you are interested in removing any limitations, we’ve assembled guides with everything you need to install the same IngressController as we provide. This will give you full control. The various resources give configuration examples and instructions for lifecycle management. These can be found here.

elx-nodegroup-controller

The nodegroup controller is useful when customers want to use custom taints or labels on their nodes. It supports matching nodes based on nodegroup or by name. The controller can be found on Github if you wish to inspect the code or deploy it yourself.

4.2 - Order a new cluster

How to order a new cluster

How to order or remove a cluster

Ordering and scaling of clusters is currently a manual process involving contact with either our sales department or our support. This is a known limitation, but may change in the future.

4.3 - Accessing your cluster

How to access your cluster

In order to access your cluster there are a couple of things you need to do. First you need to make sure you have the correct tools installed, the default client for interacting with Kubernetes clusters is called kubectl. Instructions for installing it on your system can be found by following the link.

You may of course use any Kubernetes client you wish to access your cluster however setting up other clients is beyond the scope of this documentation.

Credentials (kubeconfig)

Once you have a client you can use to access the cluster you will need to fetch the credentials for you cluster. You can find the credentials for your cluster by logging in to Elastx OpenStack IaaS. When logged in you can find the kubeconfig file for your cluster by clicking on the “Object Storage” menu option in the left-hand side menu. And then click on “Containers”, you should now see a container with the same name as your cluster (clusters are named “customer-cluster_name”). Clicking on the container should reveal a file called admin.conf in the right-hand pane. Click on the `Download" button to the right of the file name to download it to your computer.

NOTE These credentials will be rotated when your cluster is upgraded so you should periodically fetch new credentials to make sure you have a fresh set.

NOTE The kubeconfig you just downloaded has full administrator privileges.

Configuring kubectl to use your credentials

In order for kubectl to be able to use the credentials you just downloaded you need to either place the credentials in the default location or otherwise configure kubectl to utilize them. The official documentation covers this process in detail.

Verify access

To verify you’ve got access to the cluster you can run something like this:

$ kubectl get nodes
NAME                           STATUS   ROLES           AGE   VERSION
hux-lab1-control-plane-c9bmm   Ready    control-plane   14h   v1.27.3
hux-lab1-control-plane-j5p42   Ready    control-plane   14h   v1.27.3
hux-lab1-control-plane-wlwr8   Ready    control-plane   14h   v1.27.3
hux-lab1-worker-447sn          Ready    <none>          13h   v1.27.3
hux-lab1-worker-9ltbp          Ready    <none>          14h   v1.27.3
hux-lab1-worker-vszmc          Ready    <none>          14h   v1.27.3

If your output looks similar then you should be good to go! If it looks very different or contains error messages, don’t hesitate to contact our support if you can’t figure out how to solve it on your own.

Restrict access

Access to the API server is controlled in the loadbalancer in front of the API. Currently, managing the IP-range allowlist requires a support ticket here. All Elastx IP ranges are always included.

Instructions for older versions

Everything under this section is only for clusters running older versions of our private Kubernetes service.

Security groups

Note: This part only applies to clusters not already running Private Kubernetes 2.0 or later.

If your cluster was created prior to Kubernetes 1.26 or when we specifically informed you that this part applies.

If you are not sure if this part applies, you can validate it by checking if there is a security group called cluster-name-master-customer in your openstack project.

To do so, log in to Elastx Openstack IaaS. When logged in click on the “Network” menu option in the left-hand side menu. Then click on “Security Groups”, finally click on the “Manage Rules” button to the right of the security group named cluster-name-master-customer. To add a rule click on the “Add Rule” button.

For example, to allow access from the ip address 1.2.3.4 configure the rule as follows:

Rule: Custom TCP Rule
Direction: Ingress
Open Port: Port
Port: 6443
Remote: CIDR
CIDR: 1.2.3.4/32

Once you’ve set up rules that allow you to access your cluster you are ready to verify access.

4.4 - Cluster upgrades

How cluster upgrades are managed

Introduction

Kubernetes versions are released approximately three times a year, introducing enhancements, security updates, and bug fixes. The planning and initiation of a cluster upgrade is a manual task that requires coordination with our customers.

To schedule the upgrade of your cluster(s), we require a designated point of contact for coordination.
For customers with multiple clusters, please provide your preferred sequence and timeline for upgrades. If you haven’t shared this information yet, kindly submit a support ticket with these details.

Upgrade Planning

Upgrades are scheduled in consultation with the customer and can be done on at the initiative of either Elastx or the customer. If the customer does not initiate the planning of an upgrade, we will reach out to the designated contact in a support ticket at least twice a year with suggested upgrade dates.

NOTE: Upgrades are not performed during our changestop periods:

  • In general the full month of July and through the first week of August
  • December 23rd to January 2nd

Before scheduling and confirming a time slot, please review the relevant changelog and the Kubernetes Deprecated API Migration guide:

Upgrade Process

NOTE Please refrain from making any changes while the upgrade is in progress.

The duration of the upgrade typically ranges from 1 to 3 hours, depending on the size of the cluster.
The upgrade starts with the control plane nodes followed by the worker nodes, one nodegroup at a time.

Steps Involved

  1. A new node with the newer version is added to the cluster to replace the old node.
  2. Once the new node is ready, the old node is drained.
  3. Once all transferable loads have been migrated, the old node is removed from the cluster.
  4. This process is repeated until all nodes in the cluster have been upgraded.

NOTE When using public IPs on worker nodes to ensure predictable egress IP, a previously unused IP will be assigned to the new worker node. This IP should have been provided to you in a list of all allocated IPs during your request for adding public IPs on the worker nodes.

Support and Communication During Upgrades

The engineer responsible for executing the upgrade will notify you through the support ticket when the upgrade begins and once it is completed. The support ticket serves as the primary channel for communication during the upgrade process. If you have any concerns or questions about the upgrade, please use the support ticket to reach out.

Additional Information

  • Upon request, upgrades can be scheduled outside office hours if needed. Upgrades outside of office hours depend on personnel availability and comes at an additional fee, see current price for professional services.
  • Our Kubernetes service includes up to four version upgrades per year; additional upgrades can be performed at an extra cost.
  • To address critical security vulnerabilities, additional upgrades can be performed and will not count against the four upgrades included per year.
  • In a previous Tech-fika, we discussed how to build redundancy and implement autoscaling with our Kubernetes service. You can access the presentation here to help you prepare for a smoother upgrade experience.

4.5 - Your first deployment

An example deployment to get started with your Kubernetes cluster

This page will help you getting a deployment up and running and exposed as a load balancer.

Note: This guide is optional and only here to help new Kubernetes users with an example deployment.

Before following this guide you need to have ordered a cluster and followed the Accessing your cluster guide

You can verify access by running kubectl get nodes and if the output is similar to the example below you are set to go.

❯ kubectl get nodes
NAME                           STATUS   ROLES           AGE     VERSION
hux-lab1-control-plane-c9bmm   Ready    control-plane   2d18h   v1.27.3
hux-lab1-control-plane-j5p42   Ready    control-plane   2d18h   v1.27.3
hux-lab1-control-plane-wlwr8   Ready    control-plane   2d18h   v1.27.3
hux-lab1-worker-447sn          Ready    <none>          2d18h   v1.27.3
hux-lab1-worker-9ltbp          Ready    <none>          2d18h   v1.27.3
hux-lab1-worker-htfbp          Ready    <none>          15h     v1.27.3
hux-lab1-worker-k56hn          Ready    <none>          16h     v1.27.3

Creating an example deployment

To get started we need a deployment to deploy. Below we have a deployment called echoserver we can use for this example.

  1. Start off by creating a file called deployment.yaml with the content of the deployment below:

    ---
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      labels:
        app.kubernetes.io/name: echoserver
      name: echoserver
    spec:
      replicas: 3
      selector:
        matchLabels:
          app.kubernetes.io/name: echoserver
      template:
        metadata:
          labels:
            app.kubernetes.io/name: echoserver
        spec:
          containers:
          - image: gcr.io/google-containers/echoserver:1.10
            name: echoserver
    
  2. After you have created your file we can apply the deployment by running the following command:

    ❯ kubectl apply -f deployment.yaml
    deployment.apps/echoserver created
    
  3. After running the apply command we can verify that 3 pods have been created. This can take a few seconds.

    ❯ kubectl get pod
    NAME                          READY   STATUS    RESTARTS   AGE
    echoserver-545465d8dc-4bqqn   1/1     Running   0          51s
    echoserver-545465d8dc-g5xxr   1/1     Running   0          51s
    echoserver-545465d8dc-ghrj6   1/1     Running   0          51s
    

Exposing our deployment

After your pods are created we need to make sure to expose our deployment. In this example we are creating a service of type loadbalancer. If you run this application in production you would likely install an ingress controller

  1. First of we create a file called service.yaml with the content of the service below

    ---
    apiVersion: v1
    kind: Service
    metadata:
      labels:
        app.kubernetes.io/name: echoserver
      name: echoserver
      annotations:
        loadbalancer.openstack.org/x-forwarded-for: "true"
    spec:
      ports:
      - port: 80
        protocol: TCP
        targetPort: 8080
        name: http
      selector:
        app.kubernetes.io/name: echoserver
      type: LoadBalancer
    
  2. After creating the service.yaml file we apply it using kubectl

    ❯ kubectl apply -f service.yaml
    service/echoserver created
    
  3. We should now be able to use our service by running kubectl get service

    ❯ kubectl get service
    NAME         TYPE           CLUSTER-IP      EXTERNAL-IP   PORT(S)        AGE
    echoserver   LoadBalancer   10.98.121.166   <pending>     80:31701/TCP   54s
    kubernetes   ClusterIP      10.96.0.1       <none>        443/TCP        2d20h
    

    For the echo service we can see that EXTERNAL-IP says <pending> this means that a load balancer is being created but is not yet ready. As soon as the load balancer is up and running we will instead use an IP address here we can use to access our application.

    Loadbalancers usually take around a minute to be created however can sometimes take a little longer.

  4. Once the load balancer is up and running the kubectl get service should return something like this:

    ❯ kubectl get service
    NAME         TYPE           CLUSTER-IP      EXTERNAL-IP     PORT(S)        AGE
    echoserver   LoadBalancer   10.98.121.166   185.24.134.39   80:31701/TCP   2m24s
    kubernetes   ClusterIP      10.96.0.1       <none>          443/TCP        2d20h
    

Access the example deployment

Now if we open our web browser and visits the IP address we should get a response looking something like this:

Hostname: echoserver-545465d8dc-ghrj6

Pod Information:
  -no pod information available-

Server values:
  server_version=nginx: 1.13.3 - lua: 10008

Request Information:
  client_address=192.168.252.64
  method=GET
  real path=/
  query=
  request_version=1.1
  request_scheme=http
  request_uri=http://185.24.134.39:8080/

Request Headers:
  accept=text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7
  accept-encoding=gzip, deflate
  accept-language=en-US,en;q=0.9,sv;q=0.8
  host=185.24.134.39
  upgrade-insecure-requests=1
  user-agent=Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36
  x-forwarded-for=90.230.66.18

Request Body:
  -no body in request-

The Hostname shows which pod we reached and if we refresh the page we should be able to see this value change.

Cleanup

To clean up everything we created you an run the following set of commands

  1. We can start off by removing the deployment. To remove a deployment we can use kubectl delete and point it towards the file containing our deployment:

    ❯ kubectl delete -f deployment.yaml
    deployment.apps "echoserver" deleted
    
  2. After our deployment are removed we can go ahead and remove our service and load balancer. Please note that this takes a few seconds since we are waiting for the load balancer to be removed.

    ❯ kubectl delete -f service.yaml
    service "echoserver" deleted
    

4.6 - Recommendations

An list of things we recommend to get the best experience from your Kubernetes cluster

This page describes a list of things that could help you get the best experience out of your cluster.

Note: You do not need to follow this documentation in order to use your cluster

Ingress and cert-manager

To make it easier to expose applications an ingress controller is commonly deployed.

An ingress controller makes sure when you go to a specific webpage you are routed towards the correct application.

There are a lot of different ingress controllers available. We on Elastx are using ingress-nginx and have a guide ready on how to get started. However you can deploy any ingress controller you wish inside your clusters.

To get a single IP-address you can point your DNS towards we recommend to deploy an ingress-controller with a service of type LoadBalancer. More information regarding Load Balancers can be found in this link.

In order to automatically generate and update TLS certificates cert-manager is commonly deployed side by side with an ingress controller.

We have created a guide on how to get started with ingress-nginx and Cert-manager that can be found here in this link to guide.

Requests and limits

Below we describe requests and limits briefly. For a more detailed description or help setting requests and limits we recommend to check out Kubernetes documentation: https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/

Requests

Requests and limits are critical to enable Kubernetes to make informed decisions on when and where to schedule and limit your workload.

Requests are important for the scheduler. Requests can be seen as “Amount of resources the pod would utilize during normal operation”. This means that the scheduler will allocate the required amount of resources and make sure they are always available to your pod.

Requests also enables the auto-scaler to make decisions on when to scale a cluster up and down.

Limits

Limits define the maximum allowed resource usage for a pod. This is important to avoid slowdowns in other pods running on the same node.

CPU limit. Your application will be throttled or simply run slower when trying to exceed the limit. Run slower equals to fewer cpu cycles per given time. That is, introducing latency. Memory limit is another beast. If any pod trying to use memory above the the limit, the pod will be Out of memory killed.

Autoscaling

Autoscaling can operate on both node-level and pod-level. To get the absolute best experience we recommend a combination of both.

Scaling nodes

We have built-in support for scaling nodes. To get started with autoscaling we recommend to check the guide in this link.

Scaling pods

Kubernetes official documentation has a guide on how to accomplish this in this piece of documentation.

In short, node autoscaling is only taken into consideration if you have any pods that cannot be scheduled or if you have set requests on your pods. In order to automatically scale an application pod scaling can make sure you get more pods before reaching your pod limit and if more nodes are needed in order to run the new pods nodes will automatically be added and then later removed when no longer needed.

Network policies

Network policies can in short be seen as Kubernetes built in firewalls.

Network policy can be used to limit both incoming and outgoing traffic. This is useful to specify a set of pods that are allowed to communicate with the database.

Kubernetes documentation have an excellent guide on how to get started with network policies in this link.

Pod Security Standards / Pod Security Admission

Pod Security Admission can be used to limit what your pods can do. For example you can make sure pods are not allowed to run as root.

In order to get to know this more in detail and getting started we recommend to follow the Kubernetes documentation in this link.

Load Balancers

Load Balancers allow your application to be accessed from the internet. Load Balancers can automatically split traffic to all your nodes to even out load. Load Balancers can also detect if a node is having problems and remove it to avoid displaying errors to end users.

We have a guide on how to get started with Boad Balancers in this link.

5 - Guides

Guides to get more out of your Kubernetes clusters

5.1 - Auto Healing

Automatic Healing for Unresponsive or Failed Kubernetes Nodes

In our Kubernetes Services, we have implemented a robust auto-healing mechanism to ensure the high availability and reliability of our infrastructure. This system is designed to automatically manage and replace unhealthy nodes, thereby minimizing downtime and maintaining the stability of our services.

Auto-Healing Mechanism

Triggers

  1. Unready Node Detection:

    • The auto-healing process is triggered when a node remains in an “not ready” or “unknown” state for 15 minutes.
    • This delay allows for transient issues to resolve themselves without unnecessary node replacements.
  2. Node Creation Failure:

    • To ensure new nodes are given adequate time to initialize and join the cluster, we have configured startup timers:
      • Control Plane Nodes:
        • A new control plane node has a maximum startup time of 30 minutes. This extended period accounts for the critical nature and complexity of control plane operations.
      • Worker Nodes:
        • A new worker node has a maximum startup time of 10 minutes, reflecting the relatively simpler setup process compared to control plane nodes.

Actions

  1. Unresponsive Node:
    • Once a node is identified as unready for the specified duration, the auto-healing system deletes the old node.
    • Simultaneously, it initiates the creation of a new node to take its place, ensuring the cluster remains properly sized and functional.

Built-in Failsafe

To prevent cascading failures and to handle scenarios where multiple nodes become unresponsive, we have a built-in failsafe mechanism:

  • Threshold for Unresponsive Nodes:
    • If more than 35% of the nodes in the cluster become unresponsive simultaneously, the failsafe activates.
    • This failsafe blocks any further changes, as such a widespread issue likely indicates a broader underlying problem, such as network or platform-related issues, rather than isolated node failures.

By integrating these features, our Kubernetes Services can automatically handle node failures and maintain high availability, while also providing safeguards against systemic issues. This auto-healing capability ensures that our infrastructure remains resilient, responsive, and capable of supporting continuous service delivery.

5.2 - Auto Scaling

Automatically scale your kubernetes nodes

We now offer autoscaling of nodes.

What is a nodegroup?

In order to simplify node management we now have nodegroup.

A nodegroup is a set of nodes, They span over all 3 of our availability zones. All nodes in a nodegroup are using the same flavour. This means if you want to mix flavours in your cluster there will be at least one nodegroup per flavor. We can also create custom nodegroups upon requests meaning you can have 2 nodegroups with the same flavour.

By default clusters are created with one nodegroup called “worker”. When listing nodes by running kubectl get nodes you can see the node group by looking at the node name. All node names begin with clustername - nodegroup.

In the example below we have the cluster hux-lab1 and can see the default workers are located in the nodegroup worker and additionally, the added nodegroup nodegroup2 with a few extra nodes.

❯ kubectl get nodes
NAME                           STATUS   ROLES           AGE     VERSION
hux-lab1-control-plane-c9bmm   Ready    control-plane   2d18h   v1.27.3
hux-lab1-control-plane-j5p42   Ready    control-plane   2d18h   v1.27.3
hux-lab1-control-plane-wlwr8   Ready    control-plane   2d18h   v1.27.3
hux-lab1-worker-447sn          Ready    <none>          2d18h   v1.27.3
hux-lab1-worker-9ltbp          Ready    <none>          2d18h   v1.27.3
hux-lab1-worker-htfbp          Ready    <none>          15h     v1.27.3
hux-lab1-worker-k56hn          Ready    <none>          16h     v1.27.3
hux-lab1-nodegroup2-33hbp      Ready    <none>          15h     v1.27.3
hux-lab1-nodegroup2-54j5k      Ready    <none>          16h     v1.27.3

How to activate autoscaling?

Autoscaling currently needs to be configured by Elastx support.

In order to activate auto scaling we need to know clustername and nodegroup with two values for minimum/maximum number of desired nodes. Currently we have a minimum set to 3 nodes however this is subject to change in the future.

Nodes are split into availability zones meaning if you want 3 nodes you get one in each availability zone.

Another example is to have a minimum of 3 nodes and maximum of 7. This would translate to minimum one node per availability zone and maximum 3 in STO1 and 2 in STO2 and STO3 respectively. To keep it simple we recommend using increments of 3.

If you are unsure contact out support and we will help you get the configuration you wish for.

How does autoscaling know when to add additional nodes?

Nodes are added when they are needed. There are two scenarios:

  1. You have a pod that fails to be scheduled on existing nodes
  2. Scheduled pods requests more then 100% of any resource, this method is smart and senses the amount of resources per node and can therfor add more than one node at a time if required.

When does the autoscaler scale down nodes?

The autoscaler removes nodes when it senses there is enough free resources to accomodate all current workload (based on requests) on fewer nodes. To avoid all nodes having 100% resource requests (and thereby usage), there is also a built-in mechanism to ensure there is always at least 50% of a node available resources to accept additional requests.

Meaning if you have a nodegroup with 3 nodes and all of them have 4 CPU cores you need to have a total of 2 CPU cores that is not requested per any workload.

To refrain from triggering the auto-scaling feature excessively, there is a built in delay of 10 minutes for scale down actions to occur. Scale up events are triggered immediately.

Can I disable auto scaling after activating it?

Yes, just contact Elastx support and we will help you with this.

When disabling auto scaling node count will be locked. Contact support if the number if nodes you wish to keep deviates from current amount od nodes, and we will scale it for you.

5.3 - Cert-manager and Cloudflare demo

Using Cluster Issuer with cert-manager and wildcard DNS

In this guide we will use a Cloudflare managed domain and a our own cert-manager to provide LetsEncrypt certificates for a test deployment.

The guide is suitable if you have a domain connected to a single cluster, and would like a to issue/manage certificates from within kubernetes. The setup below becomes Clusterwider, meaning it will deploy certificates to any namespace specifed.

Prerequisites

Setup ClusterIssuer

Create a file to hold the secret of your api token for your Cloudflare DNS. Then create the ClusterIssuer configuration file adapted for Cloudflare.

apiVersion: v1
kind: Secret
metadata:
  name: cloudflare-api-token
  namespace: cert-manager
type: Opaque
stringData:
  api-token: "<your api token>"
---
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: cloudflare-issuer
spec:
  acme:
    server: https://acme-v02.api.letsencrypt.org/directory
    email: <your email>
    privateKeySecretRef:
      name: cloudflare-issuer-key
    solvers:
    - dns01:
        cloudflare:
          email: <your email>
          apiTokenSecretRef:
            name: cloudflare-api-token
            key: api-token
kubectl apply -f cloudflare-issuer.yml

The clusterIssuer is soon ready. Example output:

kubectl get clusterissuers.cert-manager.io 
NAME                READY   AGE
cloudflare-issuer   True    6d18h

Expose a workload and secure with Let’s encrypt certificate

In this section we will setup a deployment, with it’s accompanying service and ingress object. The ingress object will request a certificate for test2.domain.ltd, and once fully up and running, should provide https://test2.domain.ltd with a valid letsencrypt certificate.

We’ll use the created ClusterIssuer and let cert-manager request new certificates for any added ingress object. This setup requires the “*” record setup in the DNS provider.

This is how the DNS is setup in this particular example: A A record (“domain.ltd”) points to the loadbalancer IP of the cluster. A CNAME record refers to ("*") and points to the A record above.

This example also specifies the namespace “echo2”.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: echo2-dep
  namespace: echo2
spec:
  selector:
    matchLabels:
      app: echo2
  replicas: 1
  template:
    metadata:
      labels:
        app: echo2
    spec:
      containers:
      - name: echo2
        image: hashicorp/http-echo
        args:
        - "-text=echo2"
        ports:
        - containerPort: 5678
      securityContext:
        runAsUser: 1001
        fsGroup: 1001
---
apiVersion: v1
kind: Service
metadata:
  labels:
    app: echo2
  name: echo2-service
  namespace: echo2
spec:
  ports:
    - protocol: TCP
      port: 5678
      targetPort: 5678
  selector:
    app: echo2
  type: ClusterIP
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: echo2-ingress
  namespace: echo2
  annotations:
    cert-manager.io/cluster-issuer: cloudflare-issuer
    kubernetes.io/ingress.class: "nginx"
spec:
  ingressClassName: nginx
  tls:
  - hosts:
    - test2.domain.ltd
    secretName: test2-domain-tls
  rules:
  - host: test2.domain.ltd
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: echo5-service
            port:
              number: 5678

The DNS challenge and certificate issue process takes a couple of minutes. You can follow the progress by watching:

kubectl events -n cert-manager

Once completed, it shall all be accessible at http://test2.domain.ltd

5.4 - Change PV StorageClass

How to migrate between storage classes

This guide details all steps to change storage class of a volume. The instruction can be used to migrate from one storage class to another, while retaining data. For example from 8kto v2-4k.

Prerequisites

  • Access to the kubernetes cluster
  • Access to Openstack kubernetes Project

Preparation steps

  1. Populate variables

    Complete with relevant names for your setup. Then copy/paste them into the terminal to set them as environment variables that will be used throughout the guide. PVC is the

    PVC=test1
    NAMESPACE=default
    NEWSTORAGECLASS=v2-1k
    
  2. Fetch and populate the PV name by running:

    PV=$(kubectl get pvc -n $NAMESPACE $PVC -o go-template='{{.spec.volumeName}}')
    
  3. Create backup of PVC and PV configurations

    Fetch the PVC and PV configurations and store in /tmp/ for later use:

    kubectl get pvc -n $NAMESPACE $PVC -o yaml | tee /tmp/pvc.yaml
    kubectl get pv  $PV -o yaml | tee /tmp/pv.yaml
    
  4. Change VolumeReclaimPolicy

    To avoid deletion of the PV when deleting the PVC, the volume needs to have VolumeReclaimPolicy set to Retain.

    Patch:

    kubectl patch pv $PV -p '{"spec":{"persistentVolumeReclaimPolicy":"Retain"}}'
    
  5. Stop pods from accessing the mounted volume (ie kill pods/scale statefulset/etc..).

  6. Delete the PVC.

    kubectl delete pvc -n "$NAMESPACE" "$PVC"
    

Login to Openstack

  1. Navigate to: Volumes -> Volumes

  2. Make a backup of the volume From the drop-down to the right, select backup. The backup is good practice, not used in the following steps.

  3. Change the storage type to desired type. The volume should now or shortly have status Available. Dropdown to the right, Edit volume -> Change volume type:

    • Select your desired storage type
    • Select Migration policy=Ondemand

    The window will close, and the volume will be updated and migrated (to the v2 storage platform) if necessary, by the backend. The status becomes “Volume retyping”. Wait until completed.

    We have a complementary guide here.

Back to kubernetes

  1. Release the tie between PVC and PV

    The PV is still referencing its old PVC, in the claimRef, found under spec.claimRef.uid. This UID needs to be nullified to release the PV, allowing it to be adopted by a PVC with correct storageClass.

    Patch claimRef to null:

    kubectl patch pv "$PV" -p '{"spec":{"claimRef":{"namespace":"'$NAMESPACE'","name":"'$PVC'","uid":null}}}'
    
  2. The PV StorageClass in kubernetes does not match to its counterpart in Openstack.

    We need to patch the storageClassName reference in the PV:

    kubectl patch pv "$PV" -p '{"spec":{"storageClassName":"'$NEWSTORAGECLASS'"}}'
    
  3. Prepare a new PVC with the updated storageClass

    We need to modify the saved /tmp/pvc.yaml.

    1. Remove “last-applied-configuration”:

      sed -i '/kubectl.kubernetes.io\/last-applied-configuration: |/ { N; d; }' /tmp/pvc.yaml
      
    2. Update existing storageClassName to the new one:

      sed -i 's/storageClassName: .*/storageClassName: '$NEWSTORAGECLASS'/g' /tmp/pvc.yaml
      
  4. Apply the updated /tmp/pvc.yaml

    kubectl apply -f /tmp/pvc.yaml
    
  5. Update the PV to bind with the new PVC

    We must allow the new PVC to bind correctly to the old PV. We need to first fetch the new PVC UID, then patch the PV with the PVC UID so kubernetes understands what PVC the PV belongs to.

    1. Retrieve the new PVC UID:

      PVCUID=$(kubectl get -n "$NAMESPACE" pvc "$PVC" -o custom-columns=UID:.metadata.uid --no-headers)
      
    2. Patch the PV with the new UID of the PVC:

      kubectl patch pv "$PV" -p '{"spec":{"claimRef":{"uid":"'$PVCUID'"}}}'
      
  6. Reset the Reclaim Policy of the volume to Delete:

    kubectl patch pv $PV -p '{"spec":{"persistentVolumeReclaimPolicy":"Delete"}}'
    
  7. Completed.

    • Verify the volume works healthily.
    • Update your manifests to reflect the new storageClassName.

5.5 - Ingress and cert-manager

Using Ingress resources to expose services

Follow along demo

In this piece, we show all steps to expose a web service using an Ingress resource. Additionally, we demonstrate how to enable TLS, by using cert-manager to request a Let’s Encrypt certificate.

Prerequisites

  1. A DNS record pointing at the public IP address of your worker nodes. In the examples all references to the domain example.ltd must be replaced by the domain you wish to issue certificates for. Configuring DNS is out of scope for this documentation.
  2. For clusters created on or after Kubernetes 1.26 you need to ensure there is a Ingress controller and cert-manager installed.

Create resources

Create a file called ingress.yaml with the following content:

---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: my-web-service
  name: my-web-service
spec:
  replicas: 3
  selector:
    matchLabels:
      app: my-web-service
  template:
    metadata:
      labels:
        app: my-web-service
    spec:
      securityContext:
        runAsUser: 1001
        fsGroup: 1001
      containers:
      - image: k8s.gcr.io/serve_hostname
        name: servehostname
        ports:
        - containerPort: 9376
---
apiVersion: v1
kind: Service
metadata:
  labels:
    app: my-web-service
  name: my-web-service
spec:
  ports:
  - port: 9376
    protocol: TCP
    targetPort: 9376
  selector:
    app: my-web-service
  type: ClusterIP
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: my-web-service-ingress
  annotations:
    cert-manager.io/issuer: letsencrypt-prod
spec:
  ingressClassName: nginx
  tls:
  - hosts:
    - example.tld
    secretName: example-tld
  rules:
  - host: example.tld
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: my-web-service
            port:
              number: 9376

Then create the resources in the cluster by running: kubectl apply -f ingress.yaml

Run kubectl get ingress and you should see output similar to this:

NAME                     CLASS   HOSTS         ADDRESS         PORTS     AGE
my-web-service-ingress   nginx   example.tld   91.197.41.241   80, 443   39s

If not, wait a while and try again. Once you see output similar to the above you should be able to reach your service at http://example.tld.

Exposing TCP services

If you wish to expose TCP services note that the tcp-services is located in the default namespace in our clusters.

Enabling TLS

A simple way to enable TLS for your service is by requesting a certificate using the Let’s Encrypt CA. This only requires a few simple steps.

Begin by creating a file called issuer.yaml with the following content:

apiVersion: cert-manager.io/v1
kind: Issuer
metadata:
  name: letsencrypt-prod
spec:
  acme:
    # Let's Encrypt ACME server for production certificates
    server: https://acme-v02.api.letsencrypt.org/directory
    # This email address will get notifications if failure to renew certificates happens
    email: valid-email@example.tld
    privateKeySecretRef:
      name: letsencrypt-prod
    solvers:
    - http01:
        ingress:
          class: nginx

Replace the email address with your own. Then create the Issuer in the cluster by running: kubectl apply -f issuer.yaml

Next edit the file called ingress.yaml from the previous example and make sure the Ingress resource matches the example below:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: my-web-service-ingress
  annotations:
    cert-manager.io/issuer: letsencrypt-prod
spec:
  ingressClassName: nginx
  tls:
  - hosts:
    - example.tld
    secretName: example-tld
  rules:
  - host: example.tld
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: my-web-service
            port:
              number: 9376

Make sure to replace all references to example.tld by your own domain. Then update the resources by running: kubectl apply -f ingress.yaml

Wait a couple of minutes and your service should be reachable at https://example.tld with a valid certificate.

Network policies

If you are using network policies you will need to add a networkpolicy that allows traffic from the ingress controller to the temporary pod that performs the HTTP challenge. With the default NGINX Ingress Controller provided by us this policy should do the trick.

kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
  name: letsencrypt-http-challenge
spec:
  policyTypes:
  - Ingress
  podSelector:
    matchLabels:
      acme.cert-manager.io/http01-solver: "true"
  ingress:
  - ports:
    - port: http
    from:
    - namespaceSelector:
        matchLabels:
          app.kubernetes.io/name: ingress-nginx

Advanced usage

For more advanced use cases please refer to the documentation provided by each project or contact our support:

5.6 - Install and upgrade cert-manager

A guide showing you how to install, upgrade and remove cert-manager

Starting at Kubernetes version v1.26, our default configured clusters are delivered without cert-manager.

This guide will assist you get a working up to date cert-manager and provide instructions for how to upgrade and delete it. Running your own is useful if you want to have full control.

The guide is based on cert-manager Helm chart, found here. We draw advantage of the option to install CRDs with kubectl, as recommended for a production setup.

Prerequisites

Helm needs to be provided with the correct repository:

  1. Setup helm repo

    helm repo add jetstack https://charts.jetstack.io --force-update
    
  2. Verify you do not have a namespace named elx-cert-manager as you first need to remove some resources.

    kubectl -n elx-cert-manager delete svc cert-manager cert-manager-webhook
    kubectl -n elx-cert-manager delete deployments.apps cert-manager cert-manager-cainjector cert-manager-webhook
    kubectl delete namespace elx-cert-manager
    

Install

  1. Prepare and install CRDs run:

    kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.14.4/cert-manager.crds.yaml
    
  2. Run Helm install:

    helm install \
      cert-manager jetstack/cert-manager \
      --namespace cert-manager \
      --create-namespace \
      --version v1.14.4 \
    

    A full list of available Helm values is on cert-manager’s ArtifactHub page.

  3. Verify the installation: Done with cmctl (cert-manager CLI https://cert-manager.io/docs/reference/cmctl/#installation).

    cmctl check api
    

    If everything is working you should get this message The cert-manager API is ready.

Upgrade

The setup used above is referenced in the topic “CRDs managed separately”.

In these examples <version> is “v1.14.4”.

  1. Update CRDS:

    kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/<version>/cert-manager.crds.yaml
    
  2. Update the Helm chart:

    helm upgrade cert-manager jetstack/cert-manager --namespace cert-manager --version v1.14.4 
    

Uninstall

To uninstall, use the guide here.

5.7 - Install and upgrade ingress-nginx

A guide showing you how to install, upgrade and remove ingress-nginx.

This guide will assist you get a working up to date ingress controller and provide instructions for how to upgrade and delete it. Running your own is useful if you want to have full control.

The guide is based on on ingress-nginx Helm chart, found here.

Prerequisites

Helm needs to be provided with the correct repository:

  1. Setup helm repo

    helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
    
  2. Make sure to update repo cache

    helm repo update
    

Generate values.yaml

We provide settings for two main scenarios of how clients connect to the cluster. The configuration file, values.yaml, must reflect the correct scenario.

  • Customer connects directly to the Ingress:

    controller:
      kind: DaemonSet
      metrics:
        enabled: true
      service:
        enabled: true
        annotations:
          loadbalancer.openstack.org/proxy-protocol: "true"
      ingressClassResource:
        default: true
      publishService:
        enabled: false  
      allowSnippetAnnotations: true
      config:
        use-proxy-protocol: "true"
    defaultBackend:
      enabled: true
    
  • Customer connects via Proxy:

    controller:
      kind: DaemonSet
      metrics:
        enabled: true
      service:
        enabled: true
        #loadBalancerSourceRanges:
        #  - <Proxy(s)-CIDR>
      ingressClassResource:
        default: true
      publishService:
        enabled: false  
      allowSnippetAnnotations: true
      config:
        use-forwarded-headers: "true"
    defaultBackend:
      enabled: true
    
  • Other useful settings:

    For a complete set of options see the upstream documentation here.

      [...]
      service:
        loadBalancerSourceRanges:        # Whitelist source IPs.
          - 133.124.../32
          - 122.123.../24
        annotations:
          loadbalancer.openstack.org/keep-floatingip: "true"  # retain floating IP in floating IP pool.
          loadbalancer.openstack.org/flavor-id: "v1-lb-2"     # specify flavor.
      [...]
    

Install ingress-nginx

Use the values.yaml generated in the previous step.

helm install ingress-nginx ingress-nginx/ingress-nginx --values values.yaml --namespace ingress-nginx --create-namespace

Example output:

NAME: ingress-nginx
LAST DEPLOYED: Tue Jul 18 11:26:17 2023
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
The ingress-nginx controller has been installed.
It may take a few minutes for the Load Balancer IP to become available.
You can watch the status by running 'kubectl --namespace default get services -o wide -w ingress-nginx-controller'
[..]

Upgrade ingress-nginx

Use the values.yaml generated in the previous step.

helm upgrade --install ingress-nginx ingress-nginx/ingress-nginx --values values.yaml --namespace ingress-nginx

Example output:

Release "ingress-nginx" has been upgraded. Happy Helming!
NAME: ingress-nginx
LAST DEPLOYED: Tue Jul 18 11:29:41 2023
NAMESPACE: default
STATUS: deployed
REVISION: 2
TEST SUITE: None
NOTES:
The ingress-nginx controller has been installed.
It may take a few minutes for the Load Balancer IP to be available.
You can watch the status by running 'kubectl --namespace default get services -o wide -w ingress-nginx-controller'
[..]

Remove ingress-nginx

The best practice is to use the helm template method to remove the ingress. This allows for proper removal of lingering resources, then remove the namespace. Use the values.yaml generated in the previous step.

Note: Avoid running multiple ingress controllers using the same IngressClass.
See more information here.

  1. Run the delete command

    helm template ingress-nginx ingress-nginx/ingress-nginx --values values.yaml --namespace ingress-nginx | kubectl delete -f -
    
  2. Remove the namespace if necessary

    kubectl delete namespace ingress-nginx
    

5.8 - Load balancers

Using a load balancer to expose services in the cluster

Load balancers in our Elastx Kubernetes CaaS service are provided by OpenStack Octavia in collaboration with the Kubernetes Cloud Provider OpenStack. This article will introduce some of the basics of how to use services of service type LoadBalancer to expose service using OpenStack Octavia load balancers. For more advanced use cases you are encouraged to read the official documentation of each project or contacting our support for assistance.

A quick example

Exposing services using a service with type LoadBalancer will give you an unique public IP backed by an OpenStack Octavia load balancer. This example will take you through the steps for creating such a service.

Create the resources

Create a file called lb.yaml with the following content:

---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app.kubernetes.io/name: echoserver
  name: echoserver
spec:
  replicas: 3
  selector:
    matchLabels:
      app.kubernetes.io/name: echoserver
  template:
    metadata:
      labels:
        app.kubernetes.io/name: echoserver
    spec:
      containers:
      - image: gcr.io/google-containers/echoserver:1.10
        name: echoserver
---
apiVersion: v1
kind: Service
metadata:
  labels:
    app.kubernetes.io/name: echoserver
  name: echoserver
  annotations:
    loadbalancer.openstack.org/x-forwarded-for: "true"
    loadbalancer.openstack.org/flavor-id: 552c16df-dcc1-473d-8683-65e37e094443
spec:
  ports:
  - port: 80
    protocol: TCP
    targetPort: 8080
    name: http
  selector:
    app.kubernetes.io/name: echoserver
  type: LoadBalancer

Then create the resources in the cluster by running: kubectl apply -f lb.yaml

You can watch the load balancer being created by running: kubectl get svc

This should output something like:

NAME         TYPE           CLUSTER-IP     EXTERNAL-IP   PORT(S)        AGE
echoserver   LoadBalancer   10.233.32.83   <pending>     80:30838/TCP   6s
kubernetes   ClusterIP      10.233.0.1     <none>        443/TCP        10h

The output in the EXTERNAL-IP column tells us that the load balancer has not yet been completely created.

We can investigate further by running: kubectl describe svc echoserver

Output should look something like this:

Name:                     echoserver
Namespace:                default
Labels:                   app.kubernetes.io/name=echoserver
Annotations:              loadbalancer.openstack.org/x-forwarded-for: true
Selector:                 app.kubernetes.io/name=echoserver
Type:                     LoadBalancer
IP Family Policy:         SingleStack
IP Families:              IPv4
IP:                       10.233.32.83
IPs:                      10.233.32.83
Port:                     <unset>  80/TCP
TargetPort:               8080/TCP
NodePort:                 <unset>  30838/TCP
Endpoints:
Session Affinity:         None
External Traffic Policy:  Cluster
Events:
  Type    Reason                Age   From                Message
  ----    ------                ----  ----                -------
  Normal  EnsuringLoadBalancer  115s  service-controller  Ensuring load balancer

Looking at the Events section near the bottom we can see that the Cloud Controller has picked up the order and is provisioning a load balancer.

Running the same command again (kubectl describe svc echoserver) after waiting some time should produce output like:

Name:                     echoserver
Namespace:                default
Labels:                   app.kubernetes.io/name=echoserver
Annotations:              loadbalancer.openstack.org/x-forwarded-for: true
Selector:                 app.kubernetes.io/name=echoserver
Type:                     LoadBalancer
IP Family Policy:         SingleStack
IP Families:              IPv4
IP:                       10.233.32.83
IPs:                      10.233.32.83
LoadBalancer Ingress:     91.197.41.223
Port:                     <unset>  80/TCP
TargetPort:               8080/TCP
NodePort:                 <unset>  30838/TCP
Endpoints:
Session Affinity:         None
External Traffic Policy:  Cluster
Events:
  Type    Reason                Age    From                Message
  ----    ------                ----   ----                -------
  Normal  EnsuringLoadBalancer  8m52s  service-controller  Ensuring load balancer
  Normal  EnsuredLoadBalancer   6m43s  service-controller  Ensured load balancer

Again looking at the Events section we can tell that the Cloud Provider has provisioned the load balancer for us (the EnsuredLoadBalancer event). Furthermore we can see the public IP address associated with the service by checking the LoadBalancer Ingress.

Finally to verify that the load balancer and service are operational run: curl http://<IP address from LoadBalancer Ingress>

Your output should look something like:

Hostname: echoserver-84655f4656-sc4k6

Pod Information:
        -no pod information available-

Server values:
        server_version=nginx: 1.13.3 - lua: 10008

Request Information:
        client_address=10.128.0.3
        method=GET
        real path=/
        query=
        request_version=1.1
        request_scheme=http
        request_uri=http://91.197.41.223:8080/

Request Headers:
        accept=*/*
        host=91.197.41.223
        user-agent=curl/7.68.0
        x-forwarded-for=213.179.7.4

Request Body:
        -no body in request-

Things to note:

  • You do not need to modify security groups when exposing services using load balancers.
  • The client_address is the address of the load balancer and not the client making the request, you can find the real client address in the x-forwarded-for header.
  • The x-forwarded-for header is provided by setting the loadbalancer.openstack.org/x-forwarded-for: "true" on the service. Read more about available annotations in the Advanced usage section.

Advanced usage

For more advanced use cases please refer to the documentation provided by each project or contact our support:

Good to know

Load balancers are billable resources

Adding services of type LoadBalancer will create load balancers in OpenStack, which is a billable resource and you will be charged for them.

Loadbalancer statuses

Load balancers within OpenStack have two distinct statuses, which may cause confusion regarding their meanings:

  • Provisioning Status: This status reflects the overall condition of the load balancer itself. If any issues arise with the load balancer, this status will indicate them. Should you encounter any problems with this status, please don’t hesitate to contact Elastx support for assistance.
  • Operating Status: This status indicates the health of the configured backends, typically referring to the nodes within your cluster, especially when health checks are enabled (which is the default setting). It’s important to note that an operational status doesn’t necessarily imply a problem, as it depends on your specific configuration. If a service is only exposed on a single node, for instance, this is to be expected since load balancers by default distribute traffic across all cluster nodes.

Provisioning status codes

Code Description
ACTIVE The entity was provisioned successfully
DELETED The entity has been successfully deleted
ERROR Provisioning failed
PENDING_CREATE The entity is being created
PENDING_UPDATE The entity is being updated
PENDING_DELETE The entity is being deleted

Operating status codes

Code Description
ONLINE - Entity is operating normally
- All pool members are healthy
DRAINING The member is not accepting new connections
OFFLINE Entity is administratively disabled
DEGRADED One or more of the entity’s components are in ERROR
ERROR -The entity has failed
- The member is failing it’s health monitoring checks
- All of the pool members are in ERROR
NO_MONITOR No health monitor is configured for this entity and it’s status is unknown

High availability properties

OpenStack Octavia load balancers are placed in two of our three availability zones. This is a limitation imposed by the OpenStack Octavia project.

Reconfiguring using annotations

Reconfiguring the load balancers using annotations is not as dynamic and smooth as one would hope. For now, to change the configuration of a load balancer the service needs to be deleted and a new one created.

Loadbalancer protocols

Loadbalancers have support for multiple protocols. In general we would recommend everyone to try avoiding http and https simply because they do not perform as well as other protocols.

Instead use tcp or haproxys proxy protocol and run an ingress controller thats responsible for proxying within clusters and TLS.

Load Balancer Flavors

Load balancers come in multiple flavors. The biggest difference is how much traffic they can handle. If no flavor is deployed, we default to v1-lb-1. However, this flavor can only push around 200 Mbit/s. For customers wanting to push potentially more, we have a couple of flavors to choose from:

ID Name Specs Approx Traffic
16cce6f9-9120-4199-8f0a-8a76c21a8536 v1-lb-1 1G, 1 CPU 200 Mbit/s
48ba211c-20f1-4098-9216-d28f3716a305 v1-lb-2 1G, 2 CPU 400 Mbit/s
b4a85cd7-abe0-41aa-9928-d15b69770fd4 v1-lb-4 2G, 4 CPU 800 Mbit/s
1161b39a-a947-4af4-9bda-73b341e1ef47 v1-lb-8 4G, 8 CPU 1600 Mbit/s

To select a flavor for your Load Balancer, add the following to the Kubernetes Service .metadata.annotations:

loadbalancer.openstack.org/flavor-id: <id-of-your-flavor>

Note that this is a destructive operation when modifying an existing Service; it will remove the current Load Balancer and create a new one (with a new public IP).

Full example configuration for a basic LoadBalancer service:

apiVersion: v1
kind: Service
metadata:
  annotations:
    loadbalancer.openstack.org/flavor-id: b4a85cd7-abe0-41aa-9928-d15b69770fd4
  name: my-loadbalancer
spec:
  ports:
  - name: http-80
    port: 80
    protocol: TCP
    targetPort: http
  selector:
    app: my-application
  type: LoadBalancer

5.9 - Persistent volumes

Using persistent volumes

Persistent volumes in our Elastx Kubernetes CaaS service are provided by OpenStack Cinder. Volumes are dynamically provisioned by Kubernetes Cloud Provider OpenStack.

Storage classes

8k refers to 8000 IOPS.

See our pricing page under the table Storage to calculate your costs.

Below is the list of storage classes provided in newly created clusters. In case you see other storageclasses in your cluster, consider these legacy and please migrate data away from them. We provide a guide to Change PV StorageClass.

$ kubectl get storageclasses
NAME              PROVISIONER                RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   AGE
v2-128k           cinder.csi.openstack.org   Delete          WaitForFirstConsumer   true                   27d
v2-16k            cinder.csi.openstack.org   Delete          WaitForFirstConsumer   true                   27d
v2-1k (default)   cinder.csi.openstack.org   Delete          WaitForFirstConsumer   true                   27d
v2-32k            cinder.csi.openstack.org   Delete          WaitForFirstConsumer   true                   27d
v2-4k             cinder.csi.openstack.org   Delete          WaitForFirstConsumer   true                   27d
v2-64k            cinder.csi.openstack.org   Delete          WaitForFirstConsumer   true                   27d
v2-8k             cinder.csi.openstack.org   Delete          WaitForFirstConsumer   true                   27d

Example of PersistentVolumeClaim

A quick example of how to create an unused 1Gi persistent volume claim named example:

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: example
spec:
  accessModes:
    - ReadWriteOnce
  volumeMode: Filesystem
  resources:
    requests:
      storage: 1Gi
  storageClassName: v2-16k
$ kubectl get persistentvolumeclaim
NAME      STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
example   Bound    pvc-f8b1dc7f-db84-11e8-bda5-fa163e3803b4   1Gi        RWO            v2-16k            18s

Good to know

Cross mounting of volumes between nodes

Cross mounting of volumes is not supported! That is a volume can only be mounted by a node residing in the same availability zone as the volume. Plan accordingly for ensured high availability!

Limit of volumes and pods per node

In case higher number of volumes or pods are required, consider adding additional worker nodes.

Kubernetes version Max pods/node Max volumes/node
v1.25 and lower 110 25
v1.26 and higher 110 125

Encryption

All volumes are encrypted at rest in hardware.

Volume type hostPath

A volume of type hostPath is in reality just a local directory on the specific node being mounted in a pod, this means data is stored locally and will be unavailable if the pod is ever rescheduled on another node. This is expected during cluster upgrades or maintenance, however it may also occur because of other reasons, for example if a pod crashes or a node is malfunctioning. Malfunctioning nodes are automatically healed, meaning they are automatically replaced.

You can read more about hostpath here.

If you are looking for a way to store persistent data we recommend to use PVCs. PVCs can move between nodes within one data-center meaning any data stored will be present even if the pod or node is recreated.

Known issues

Resizing encrypted volumes

Legacy: encrypted volumes do not resize properly, please contact our support if you wish to resize such a volume.

5.10 - Kubernetes API whitelist

How to limit access to the kubernetes API

In our Kubernetes Services, we rely on Openstack loadbalancers in front of the control planes to ensure traffic will be sent to a functional node. Whitelisting of access to the API server is now controlled in the loadbalancer in front of the API. Currently, managing the IP-range whitelist requires a support ticket here.

Please submit a ticket with CIDR/ranges for the ip’s you wish to whitelist. We are happy to help you ASAP.

Note: All Elastx IP ranges are always included.

In the future, we expect to have this functionality available self-service style.

6 - Knowledge base

Articles on specific issues/subjects

6.1 - Migration to Kubernetes CaaS v2

Everything you need to know and prepare prior to migrating your cluster to Kubernetes CaaS v2

** Please note this document was updated 20240305.

This document will guide through all new changes introduced when migrating to our new kubernetes deployment backend. All customers with a Kubernetes cluster created on Kubernetes 1.25 and earlier are affected.

We have received, and acted upon, customer feedback since our main announcement 2023Q4. We provide two additional paths to reach v1.26:

  • We’ve reverted to continue providing Ingress/Certmanager.
  • To assist with your transition we can offer you an additional cluster (v1.26 or latest version) up to 30 days at no extra charge.

Show-Details

All customers will receive this information when we upgrade clusters to v1.26, which also includes the migration procedure. Make sure to carefully read through and understand the procedure and changes in order to avoid potential downtime during the upgrade.

Pre-Upgrade Information:

  • The following overall steps are crucial for a seamless upgrade process:

    1. Date for the upgrade is agreed upon.
    2. For users of Elastx managed ingress opting to continue with our management services:
      • Elastx integrates a load balancer into the ingress service. The load balancer is assigned an external IP-address that will be used for all DNS records post-transition (do not point DNS to this IP at this point).
      • Date of the traffic transition to the load balancer is agreed upon.
  • Important Note Before the Upgrade:

    • Customers are required to carefully read and comprehend all changes outlined in the migration documentation to avoid potential downtime or disruptions.
    • In case of any uncertainties or challenges completing the steps, please contact Elastx support. We are here to assist and can reschedule the upgrade to a more suitable date if needed.

To facilitate a seamless traffic transition, we recommend the following best practices:

  • Utilize CNAMEs when configuring domain pointers for the ingress. This approach ensures that only one record needs updating, enhancing efficiency.
  • Prior to implementing the change, verify that the CNAME record has a low Time-To-Live (TTL), with a duration of typically 1 minute, to promote rapid propagation.

During the traffic transition:

  • All DNS records or proxies need to be updated to point towards the new loadbalancer
    • In order to make this change as seamless as possible. We recommend the customer to make use of CNAMEs when pointing domains towards the ingress. This would ensure only one record needs to be updated. Prior to the change make sure the CNAME record has a low TTL, usually 1 minute is good to ensure rapid propagation

During the traffic transition:

  1. Elastx will meticulously update the ingress service configuration to align with your specific setup.
  2. The customer is responsible for updating all DNS records or proxies to effectively direct traffic towards the newly implemented load balancer.

During the Upgrade:

  • Elastx assumes all necessary pre-upgrade changes have been implemented unless notified otherwise.
  • On the scheduled upgrade day, Elastx initiates the upgrade process at the agreed-upon time.
  • Note: The Kubernetes API will be temporarily unavailable during the upgrade due to migration to a load balancer.
  • Upgrade Procedure:
    • The upgrade involves replacing all nodes in your cluster twice.
    • Migration to the new cluster management backend system will occur during Kubernetes 1.25, followed by the cluster upgrade to Kubernetes 1.26.

After Successful Upgrade:

  • Users are advised to download a new kubeconfig from the object store for continued access and management.

Possibility to get a new cluster instead of migrating

To address the growing demand for new clusters rather than upgrades, customers currently running Kubernetes 1.25 (or earlier) can opt for a new Kubernetes cluster instead of migrating their existing one. The new cluster can be of version 1.26 or the latest available (1.29 at the moment). This new cluster is provided free of charge for an initial 30-day period, allowing you the flexibility to migrate your services at your own pace. However, if the migration extends beyond 30 days, please note that you will be billed for both clusters during the extended period. We understand the importance of a smooth transition, and our support team is available to assist you throughout the process.

Ingress

We are updating the way clusters accept incoming traffic by transitioning from accepting traffic on each worker node to utilizing a load balancer. This upgrade, effective from Kubernetes 1.26 onwards, offers automatic addition and removal of worker nodes, providing enhanced fault management and a single IP address for DNS and/or WAF configuration.

Before upgrading to Kubernetes 1.26, a migration to the new Load Balancer is necessary. See below a flowchart of the various configurations. In order to setup the components correctly we need to understand your configuration specifics. Please review your scenario:

Show-Details

Using Your Own Ingress

If you manage your own add-ons, you can continue doing so. Starting from Kubernetes 1.26, clusters will no longer have public IP addresses on all nodes by default. We strongly recommend implementing a load balancer in front of your ingress for improved fault tolerance, especially in handling issues better than web browsers.

Elastx managed ingress

If you are using the Elastx managed ingress, additional details about your setup are required.

Proxy Deployed in Front of the Ingress (CDN, WAF, etc.)

If a proxy is deployed, provide information on the IP addresses used by your proxy. We rely on this information to trust the x-forwarded- headers. By default, connections that do not come from your proxy are blocked directly on the load balancer, enforcing clients to connect through your proxy.

Clients Connect Directly to Your Ingress

If clients connect directly to the ingress, we will redirect them to the new ingress. To maintain client source IPs, we utilize HAProxy proxy protocol in the load balancer. However, during the change, traffic will only be allowed to the load balancer for approximately 1-2 minutes. Please plan accordingly, as some connections may experience downtime during this transition.

Floating IPs

Floating IPs (FIPs) are now available for customers who choose to opt in. As part of the upgrade to Kubernetes 1.26, floating IPs will be removed from nodes by default. Instead, Load Balancers will be employed to efficiently direct traffic to services within the cluster.

Please note that current floating IPs will be lost if customers do not opt in for this feature during the upgrade process.

Should you wish to continue utilizing Floating IPs or enable them in the future, simply inform us, and we’ll ensure to assist you promptly.

A primary use case where Floating IPs prove invaluable is in retaining control over egress IP from the cluster. Without leveraging FIPs, egress traffic will be SNAT’ed via the hypervisor.

Kubernetes API

We are removing floating IPs for all control-plane nodes. Instead, we use a Load Balancer in front of control-planes to ensure the traffic will be sent to an working control-plane node.

Whitelisting of access to the API server is now controlled in the loadbalancer in front of the API. Currently, managing the IP-range whitelist requires a support ticket here. All Elastx IP ranges are always included.

Node local DNS

During the Kubernetes 1.26 upgrade we stop using the nodelocaldns. However to ensure we does not break any existing clusters the service will remain installed.

All nodes being added to the cluster running Kubernetes 1.26 or later will not make use of nodelocaldns and pods being created on upgraded nodes will instead make use of the CoreDNS service located in kube-system.

This may affect customers that make use of network policies. If the policy only allows traffic to nodelocaldns, it is required to update the policy to also allow traffic from the CoreDNS service.

Network policy to allow CoreDNS and NodeLocalDNS Cache

This example allows DNS traffic towards both NodeLocalDNS and CoreDNS. This policy is recommended for customers currently only allowing DNS traffic towards NodeLocalDNS and can be used in a “transition phase” prior to upgrading to Kubernetes 1.26.

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-dns-access
spec:
  podSelector: {}
  egress:
    - ports:
        - protocol: UDP
          port: 53
        - protocol: TCP
          port: 53
      to:
        - ipBlock:
            cidr: 169.254.25.10/32
        - podSelector:
            matchLabels:
              k8s-app: kube-dns
  policyTypes:
    - Egress

Network policy to allow CoreDNS

This example shows an example network policy that allows DNS traffic to CoreDNS. This can be used after the upgrade to Kubernetes 1.26

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-dns-access
spec:
  podSelector: {}
  egress:
    - ports:
        - protocol: UDP
          port: 53
        - protocol: TCP
          port: 53
      to:
        - podSelector:
            matchLabels:
              k8s-app: kube-dns
  policyTypes:
    - Egress