Remove old EBS volumes

Rationale TL;DR

You deployed Kubernetes on AWS and you want to remove the old EBS volumes there

The solution

So you deployed K8S on AWS (Kops or EKS) and you start to see EBS volumes not attached. That is money you can save. However, you don’t want to go to the AWS console (who wants) to check those volumes, instead you want some script that would allow you to find those volumes and delete them.

aws ec2 describe-volumes --filters Name=status,Values=available | jq '.Volumes[].VolumeId' -r | xargs -L 1 -I VOLUME aws ec2 delete-volume --volume-id VOLUME

This would find any volume whose status is available. The jq command gets the volume ids and the xargs would use those ids and remove those volumes.

Kops and instance groups

Rationale TL;DR

You deployed Kops and now you want to create an additional node pool

The solution

Cluster changes

First, we need to create a new instance group (what GKE calls a node pool):

kops create ig $NAME --subnet $SUBNET1[,$SUBNET2.*] --name $CLUSTER_NAME --state s3://$BUCKET_NAME

This will bring up your preferred editor (is it vim?) and then you can tweak the IG as you want:

apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: "2020-09-11T18:34:56Z"
  labels:
    kops.k8s.io/cluster: CLUSTER_NAME
  name: elasticsearch
spec:
  image: kope.io/k8s-1.17-debian-stretch-amd64-hvm-ebs-2020-07-20
  machineType: t3.medium
  maxSize: 2
  minSize: 2
  nodeLabels:
    dedicated: "true"
    service: elasticsearch
    spot: "false"
  role: Node
  subnets:
  - SUBNET1
  taints:
  - service=elasticsearch:NoSchedule

Set the maxSize and minSize values accordingly. In this example, in setting some taints and node labels. I will use that to prevent any pod but only a few to be scheduled there. The nodeLabels will allow me to schedule the pods of a deployment or statefulset to the nodes in this IG and thus prevent them to be scheduled on a different IG. Bottom line: to use those nodes in that IG for only a specific deployment or statefulset.

And finally, we’d need to update the cluster. Because we want to use Terraform, we need to do this:

kops update cluster  --name $CLUSTER_NAME --state s3://$BUCKET_NAME —target terraform —out .

After this, we need to apply the Terraform plans. Use your CI/CD, I encourage you to do so. Otherwise, you’d need to apply it manually.

Finally, do a rolling update:

kops rolling-update cluster  --name $CLUSTER_NAME --state s3://$BUCKET_NAME --yes

Kops and cluster autoscaler

Rationale TL;DR

You deployed Kops and now you want your cluster to autoscale

Solution

Cluster changes

First, set the max and min sizes for your clusters.

kops edit instancegroups nodes

Set the maxSize and minSize values accordingly.

After that, you would need to edit to edit the cluster config and set the following policy:

kind: Cluster
...
spec:
  additionalPolicies:
    node: |
      [
        {
          "Effect": "Allow",
          "Action": [
            "autoscaling:DescribeAutoScalingGroups",
            "autoscaling:DescribeAutoScalingInstances",
            "autoscaling:DescribeLaunchConfigurations",
            "autoscaling:SetDesiredCapacity",
            "autoscaling:TerminateInstanceInAutoScalingGroup",
            "autoscaling:DescribeTags"
          ],
          "Resource": "*"
        }
      ]
...

And finally, we’d need to update the cluster. Because we want to use Terraform, we need to do this:

kops update cluster  --name $CLUSTER_NAME --state s3://$BUCKET_NAME —target terraform —out .

After this, we need to apply the Terraform plans. Use your CI/CD, I encourage you to do so. Otherwise, you’d need to apply it manually.

Finally, do a rolling update:

kops rolling-update cluster  --name $CLUSTER_NAME --state s3://$BUCKET_NAME --yes

Cluster autoscaling

Once we have our cluster with max size set and the nodes with the right policies, we need to download the Cluster Autoscaler manifest:

https://raw.githubusercontent.com/kubernetes/autoscaler/master/cluster-autoscaler/cloudprovider/aws/examples/cluster-autoscaler-autodiscover.yaml

We need to change some values:

  • “YOUR CLUSTER NAME”
  • Also set the right version when here k8s.gcr.io/cluster-autoscaler:XX.XX.XX

This should be enough. Scaling a deployment with plenty of pods should trigger a new node.