We're switching the management model of our reference Kubernetes solution from kops to EKS. The main reason is to leverage the control plane management to AWS, among other benefits. If you're interested in our take at comparing the two K8s management solutions and the reasons why we're moving from one to the other, keep reading.
A couple of years ago we made the decision to stop building highly custom environments per customer (a.k.a. snowflakes). The previous approach was limiting innovation and wasting our customers money. We are super focused on a specific type of workload: open technology SaaS companies. This meant we could invest in a single, re-usable platform with the necessary configurability instead. We've bundled our insights, industry best practices and best-of-breed technologies into what we call our SaaS Reference Solution. The main ingredients are Amazon Web Services and Kubernetes complemented with best-of-breed Open Source components like Concourse CI, Vault, Prometheus, etc. It is fully implemented as infrastructure-as-code using Terraform and we use automation to deploy and manage the setups. This allows for full builds in under a day and highly reliable maintenance.
Two years ago we started using kops for managing the base of the Reference Solution as it offered a good balance between ease of management and customisation. And although it had its shortcomings, it fit the bill for a long time (1.7 → 1.11).
When AWS first announced EKS in 2018, we quickly signed up for the beta to test it out and evaluate if it would be a good replacement for kops in our reference solution.
kops is an open-source tool to setup and maintain Kubernetes clusters on AWS and GCE (GCE support is still in beta though). It manages the whole control plane of the cluster, as well as all the node pools needed. With a few simple commands you can get a Kubernetes cluster provisioned in few minutes.
Of course with kops you're responsible for maintaining the whole cluster, including the cluster masters and
etcd, as it's not a managed service.
kops has been around the Kubernetes ecosystem for quite a while now. The earliest k8s version it supports is 1.8. It has become a very mature project and a very popular option when it comes to easily provisioning production-grade k8s clusters.
- Supports AWS and GCE
- We use a single OIDC-based SSO setup for both K8s API and dashboard authentication (Grafana, Prometheus, Kibana, ...)
- Declarative configuration
- Operational overhead of managing the control plane. Etcd v3 and Calico 3 upgrade are a pain and will be disruptive
- Disruptive certificate/secret rotation. It's not possible to revoke cluster Admin credentials. See https://github.com/kubernetes/kops/blob/master/docs/rotate-secrets.md and https://github.com/kubernetes/kops/issues/1020
- Very slow rolling update mechanism. We've implemented our own rolling updates instead
- Important security patches (eg. for CVE-2019-5736) can take a while to get in the stable kops release. We usually had to roll our own patches for critical security issues.
- Slow release cycle. It took 230 days for kops to support Kubernetes version 1.12 (the latest supported version as of this writing). Here's a spreadsheet comparing the release cycles of the main k8s managed solutions and kops (might not be fully up-to-date).
More concretely this means that AWS abstracts and manages the Kubernetes control plane (
kube-controller-manager and of course
etcd). With EKS you are still responsible yourself for creating and managing the worker nodes. Of course, AWS provides the necessary AMIs and CloudFormation templates to get you started. You can also use
eksctl by Weaveworks, which is officially endorsed by AWS.
Compared to the other major cloud providers AWS was somewhat late to the Kubernetes party. But when we got the chance to be part of the closed EKS beta early 2018, as part of the AWS Advanced partnership, we immediately jumped on the opportunity to compare this solution to our own kops-based setups. At that time there were still a lot of challenges to overcome with EKS and K8s 1.9. We concluded it wouldn't be a viable platform for or reference solution yet. Some of the limitations we faced back then:
- Not working
kubectl proxy|logs|exec, making debugging quite hard
- Deploying via Helm didn't work
- There wasn't an in-place upgrade path yet for major cluster versions
Of course we provided the necessary feedback to AWS and kept a close eye on EKS development, while continuing to improve on our kops-based offering.
Things started to get more interesting when EKS with K8s 1.10 got released. Around February 2019 we re-evaluated the whole platform. Things were looking better now. After some more Proof-of-Concepts it was clear that the advantages for us of using EKS outweigh the main remaining challenges.
- Less operational overhead for the cluster control plane
- We still have full control over rolling worker nodes and add-on updates and we can fully manage our platform through Terraform
- Cheaper control plane: fixed price of ~ $150/mo vs $150-$300/mo (depending on cluster size)
- ISO and PCI compliance
- 99.9% SLA for API availability from AWS
- We replace the default AWS VPC CNI with Calico. As a result the control plane can't communicate with Pods (or Services) using Pod IP addresses and thus using
kubectl proxyto proxy requests to a Pod/Service won't work (and failing E2E tests)
- It's not possible to set OIDC options, so we use mixed Authentication: IAM-based for the K8s API (
kubectland kubernetes-dashboard) and OIDC based for other dashboards like Grafana, Prometheus, Kibana, ...
After a rocky start with EKS, we have now fully embraced the platform and expect to overcome most of the remaining challenges soon. Going forward all new platforms we setup and maintain for our customers will use EKS as a base. In the coming months existing customers will also be offered the choice to migrate their kops-based clusters to EKS (most have already signed up for this).
Does this all sound interesting and want to learn how we can help you running your workloads on Kubernetes and EKS? Contact us to find out!
From a high traffic monolithic app to serverless
By Simon Rondelez
We converted a high traffic, monolithic application to a serverless architecture resulting in a highly scalable setup and 91% cost savings.
Kubernetes Kubelet memory leak on kubectl port-forward
By Iuri Aranda
This is a post to document the progress on the kubelet memory leak issue when creating port-forwarding connections.