How we went from kops to EKS in production

Amazon’s EKS service (Elastic Container Service for Kubernetes) allows you to create a Kubernetes control plane in your AWS account without having to configure Kubernetes master nodes, etcd, or the api servers. In this blog post we will cover the motivation for using EKS, the preparation required to create an EKS cluster, how to configure EKS in Terraform, and how to set up kube2iam with EKS.

In this blog:

Why we migrated

How we prepared

Configuring with the Terraform EKS module

Setting up kube2iam

Monitoring

Why we migrated to EKS

At Blue Matador, our production Kubernetes cluster has so far been managed using kops. I recently wrote a blog post comparing EKS and Kops and decided that it was time to use EKS in our production cluster for the following reasons:

The Amazon EKS Shared Responsibility Model means that I am not alone in securing my Kubernetes cluster.
I prefer not to worry about a possible etcd or api server outage in our kops cluster and having to debug it myself.
Our current setup using kops utilized 3 c4.large instances for the master nodes; EKS ended up being cheaper for us when factoring in EBS volumes, hourly instance prices, and the network traffic used by the master nodes.
At Blue Matador we pride ourselves on having first-hand experience with both AWS and Kubernetes, and running EKS in production like many of our users do will help us better understand and monitor it.

Prepping for the move

One of the drawbacks of using EKS is that you must have a few things in place to correctly use the EKS cluster and authenticate with it. A lot of the more tedious aspects of configuring EKS can be handled with Terraform or CloudFormation, but a few items must be addressed before you can begin.

First, make sure you have installed kubectl version 1.12 or higher. Then, you must install the AWS CLI with at least version 1.16.73 and then aws-iam-authenticator. This will allow you to authenticate to the EKS cluster using IAM.

The next piece you will need is a VPC (Amazon Virtual Private Cloud) to run your Kubernetes cluster in. AWS recommends creating a new VPC, and this can be done using CloudFormation or Terraform. You can also use an existing VPC if you desire as long as it is appropriately configured. In either case, ensure that your VPC meets the following requirements if you want to match the set up I will be using in Terraform:

Create both a public and private subnet in each of 3 availability zones. This will give your cluster high availability and protect your worker nodes from the public internet.

Public subnets are for resources that will be addressable on the public internet such as an Application Load Balancer. Ensure each public subnet has a route to the internet using an Internet Gateway
Private subnets are for resources that should not be accessible from the internet, such as your worker nodes. Ensure each private subnet has a route to the internet using a NAT Gateway

Tag your public and private subnets appropriately for Kubernetes to create Load Balancers

Public subnets should be tagged with Key kubernetes.io/role/elb and Value 1

Private subnets should be tagged with Key kubernetes.io/role/internal-elb and Value 1

Make sure your subnets are sufficiently large to run your workload. Kubernetes clusters created using EKS will use the IP address space in your subnets for your pods, so using small subnets could limit the number of pods you are able to run in your cluster

Configuring EKS with the EKS Terraform module

We use Terraform for most of our infrastructure management, so I will cover the specific Terraform configuration used to set up EKS in my case. If you are not a Terraform user, you can follow the instructions in this blog post to set up EKS using CloudFormation and the AWS CLI. Even if you do not use Terraform, I recommend reading this section since I also cover using encrypted EBS volumes and setting up basic IAM permissions in Terraform.

Configuring EKS in Terraform is made much simpler by using the EKS Terraform module which handles most of the IAM, EKS, and kubectl configuration for you. We will create a cluster using version 1.12 of Kubernetes, and version 3.0.0 of the EKS module, which requires at least version 2.7.0 of the AWS provider.

provider "aws" {
  region  = "us-east-1"
  version = "2.7.0"
}

The first thing we need to do is copy the AMI used for the EKS workers so that we can get an encrypted AMI. Having an encrypted AMI allows us to have encrypted root devices on our EKS workers, a common security requirement for many applications. This can be accomplished using the following Terraform config:

data "aws_ami" "eks_worker_base" {
  filter {
    name = "name"
    values = ["amazon-eks-node-1.12-*"]
  }

  most_recent = true

  # Owner ID of AWS EKS team
  owners = ["602401143452"]
}

resource "aws_ami_copy" "eks_worker" {
  name  = "${data.aws_ami.eks_worker_base.name}-encrypted"
  description  = "Encrypted version of EKS worker AMI"
  source_ami_id  = "${data.aws_ami.eks_worker_base.id}"
  source_ami_region = "us-east-1"
  encrypted = true

  tags = {
    Name = "${data.aws_ami.eks_worker_base.name}-encrypted"
  }
}

Next, we will configure the EKS module. We specify the id of the VPC we are using and the subnet ids for our public and private subnets so the EKS cluster is created in the correct network. We’ve also specified a few options to help with setting up aws-iam-authenticator. After the EKS cluster is created, terraform will automatically set up IAM permissions for the specified roles and users, and create a kubeconfig file configured to use with aws-iam-authenticator. We’ve also configured a group of worker nodes using the encrypted AMI we created to spin up 3 m4.xlarge instances. The EKS module handles all of the IAM role creation, security group setup, and worker bootstrapping for us.

module "eks_cluster" {
  source  = "terraform-aws-modules/eks/aws"
  version = "3.0.0"

  cluster_version = "1.12"

  cluster_name = "MyCluster"

  # your VPC ID
  vpc_id       = "vpc-000000"

  # The private AND public subnet ids
  subnets = [
    "subnet-000001",
    "subnet-000002",
    "subnet-000003",
    "subnet-000004",
    "subnet-000005",
    "subnet-000006"
  ]

  # Modify these to control cluster access
  cluster_endpoint_private_access = "true"
  cluster_endpoint_public_access  = "true"

  # Makes configuring aws-iam-authenticator easy 
  write_kubeconfig      = true

  # Change to wherever you want the generated kubeconfig to go
  config_output_path    = "./"

  # Makes specifying IAM users or roles with cluster access easier
  manage_aws_auth       = true
  write_aws_auth_config = true

  # Specify IAM users that you want to be able to use your cluster
  map_users_count = 2
  map_users = [
    {
      user_arn = "arn:aws:iam::12345678912:user/user1"
      username = "user1"
      group    = "system:masters"
    },
    {
      user_arn = "arn:aws:iam::12345678912:user/user2"
      username = "user2"
      group    = "system:masters"
    },
  ]

  # If you use IAM roles to federate IAM access, specify the roles you want to be able to use your cluster
  map_roles_count = 1
  map_roles = [
    {
      role_arn = "arn:aws:iam::12345678912:role/role1"
      username = "role1"
      group    = "system:masters"
    },
  ]

  # This creates an autoscaling group for your workers
  worker_group_count = 1
  worker_groups = [
    {
      name                 = "workers"
      instance_type        = "m4.xlarge"
      asg_min_size         = 3
      asg_desired_capacity = 3
      asg_max_size         = 3
      root_volume_size     = 100
      root_volume_type     = "gp2"

      # This evaluate to our encrypted AMI from before!
      ami_id               = "${aws_ami_copy.eks_worker.id}"

      # Specify the SSH key pair you wish to use 
      key_name          = "all"

      # Optionally enable 1-minute CloudWatch metrics
      enable_monitoring = false

      # Specify your private subnets here as a comma-separated string
      subnets = "subnet-000001,subnet-000002,subnet-000003"
    },
  ]

  # Add any other tags you wish to the resources created by this module
  tags = {
    Cluster = "MyCluster"
  }
}

output "worker_role_arn" {
  value = "${module.eks_cluster.worker_iam_role_arn}"
}

At this point, you should be able to run terraform plan to see what resources will be created, and then terraform apply to create the resources. The EKS module has many supported options, you can check those out in the examples directory in GitHub.

It will take a few minutes for the encrypted AMI to be created, and then upwards of 15 minutes for your EKS cluster and workers to be ready. I’ve added an output for the ARN of the IAM role that is created and used by your workers. This ARN will be needed in the next step to set up kube2iam in your cluster.

Since we configured the EKS module to output a kubeconfig file, we need to configure kubectl to use that file to authenticate with our cluster, then switch our context to that cluster. The generated file will be called kubeconfig_MyCluster where MyCluster is the cluster_name specified in the EKS module. Kubectl uses the KUBECONFIG environment variable to discover multiple kubeconfig files, so we can update it with the new file like so:

export KUBECONFIG=$KUBECONFIG:/path/to/kubeconfig_MyCluster

Then, switch your kubectl context so you can test connectivity to the new cluster:

kubectl config use-context eks_MyCluster
kubectl cluster-info

If everything is set up correctly, you should see an output similar to:

Kubernetes master is running at https://AAAAAAAAAAAAAAAAAAAAAAAAAAAA.yl1.us-east-1.eks.amazonaws.com
CoreDNS is running at https://AAAAAAAAAAAAAAAAAAAAAAAAAAAA.yl1.us-east-1.eks.amazonaws.com/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy

To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.

Setting up kube2iam

Once your cluster is set up using either Terraform or CloudFormation, I recommend installing kube2iam. Kube2iam allows you to use IAM roles to give individual pods access to your other AWS resources. Without some way to delegate IAM access to pods, you would instead have to give your worker nodes every IAM permission that your pods need, which is cumbersome to manage and a poor security practice. For more information on IAM access in Kubernetes, you can check out my blog series on the subject.

First, set up RBAC for the kube2iam service account:

---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: kube2iam
  namespace: kube-system
---
apiVersion: v1
items:
  - apiVersion: rbac.authorization.k8s.io/v1
    kind: ClusterRole
    metadata:
      name: kube2iam
    rules:
      - apiGroups: [""]
        resources: ["namespaces","pods"]
        verbs: ["get","watch","list"]
  - apiVersion: rbac.authorization.k8s.io/v1
    kind: ClusterRoleBinding
    metadata:
      name: kube2iam
    subjects:
    - kind: ServiceAccount
      name: kube2iam
      namespace: kube-system
    roleRef:
      kind: ClusterRole
      name: kube2iam
      apiGroup: rbac.authorization.k8s.io
kind: List

Then, install the kube2iam DaemonSet. The kube2iam agent will run on each worker node and intercept calls to the EC2 metadata API. If your pods are annotated correctly, kube2iam will assume the role specified by your pod to authenticate the request, allowing your pods to access AWS resources using roles, and requiring no change to your application code. The only option we have to pay attention to specifically for EKS is to set the --host-interface option to eni+. Kube2iam has many configuration options that are documented on the GitHub repo, but this manifest will get you started:

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: kube2iam
  namespace: kube-system
  labels:
    app: kube2iam
spec:
  updateStrategy:
    type: RollingUpdate
  selector:
    matchLabels:
      name: kube2iam
  template:
    metadata:
      labels:
        name: kube2iam
    spec:
      serviceAccountName: kube2iam
      hostNetwork: true
      containers:
        - image: jtblin/kube2iam:0.10.4
          imagePullPolicy: Always
          name: kube2iam
          args:
            - "--app-port=8181"
            - "--auto-discover-base-arn"
            - "--iptables=true"
            - "--host-ip=$(HOST_IP)"
            - "--host-interface=eni+"
            - "--auto-discover-default-role"
            - "--log-level=info"
          env:
            - name: HOST_IP
              valueFrom:
                fieldRef:
                  fieldPath: status.podIP
          ports:
            - containerPort: 8181
              hostPort: 8181
              name: http
          securityContext:
            privileged: true

Once kube2iam is set up, you can add an annotation in the pod spec of your deployments or other pod controllers to specify which IAM role should be used for the pod like so:

      annotations:
        iam.amazonaws.com/role: my-role

Make sure that the roles you are using with kube2iam have been configured so that your worker nodes can assume those roles. This is why we output the worker_role_arn from the Terraform EKS module in the last step. Modify your pod roles so they can be assumed by your worker nodes by adding the following to the Trust Relationship for each role:

    {
      "Sid": "",
      "Effect": "Allow",
      "Principal": {
        "AWS": "WORKER_ROLE_ARN"
      },
      "Action": "sts:AssumeRole"
    }

Be sure to replace WORKER_ROLE_ARN with the ARN of the IAM Role that your EKS worker nodes are configured with, not the ARN of the Instance Profile.

Monitoring Kubernetes on AWS

We have covered the motivation for using EKS to create your Kubernetes cluster, the required setup to get going with EKS, how to use the Terraform EKS module to create a cluster, and how to set up kube2iam in your new EKS cluster. There are a lot of options that I did not cover at every step, and I encourage you to read the documentation for EKS and kube2iam to understand what each option does and ensure your production cluster is set up correctly and securely.

Once your cluster is configured, you can run your application code in your new EKS cluster, set up a logging solution, and set up a monitoring solution.

If you are looking for a monitoring solution for Kubernetes on AWS, consider Blue Matador.

Blue Matador automatically checks for over 25 Kubernetes events out-of-the-box. We also monitor over 20 AWS services in conjunction with Kubernetes, providing full coverage for your entire production environment with no alert configuration or tuning required.