GKE Auto Scaling down to shut down resource usage and save costs.

IRJ

I'm looking to scale down a large GKE cluster in a non prod environment to save costs. I have two main goals.

Scale down to use very minimal resources (basically shutdown).
Fast, automated restore

It's important that database persistent disk stays attached, and I don't mind keeping the database active and scaling other services down to zero.

Just looking for some thoughts on the subject.

IRJ

Scale Down

######################################
## Save Deployment State (excludes kube,mongo,k8 pods)
######################################

kubectl get deploy -A --no-headers | grep -v -E 'kube|mongo|k8s-api-proxy' > deploy_state_before_scale.txt

######################################
## Copy Deployment State to GCS Bucket
######################################

gsutil cp deploy_state_before_scale.txt gs://app1

#######################################
## Scale Deployments to zero
#######################################

kubectl get deploy -A --no-headers | grep -v -E 'kube|mongo|k8s-api-proxy' | awk '{print \$1,\$2}' | while read NS DEPLOY; do  kubectl scale --replicas=0 deployment/\$DEPLOY -n \$NS; done

#######################################
## Scale Daemons to zero
#######################################
kubectl -n <namespace> patch daemonset <name-of-daemon-set> -p '{"spec": {"template": {"spec": {"nodeSelector": {"non-existing": "true"}}}}}'


#######################################
## Turn off Autoscaler on GKE nodepools
#######################################

gcloud container clusters update <app1-cluster> --no-enable-autoscaling --region <region>  --node-pool <app1nodepool1>

gcloud container clusters update <app1-cluster> --no-enable-autoscaling --region <region>  --node-pool <app1nodepool2>


#######################################
## Resize Node Pools to zero
#######################################

gcloud container clusters update <app1-cluster> --num-nodes 0 --region <region>  --node-pool <app1nodepool1>

gcloud container clusters update <app1-cluster> --num-nodes 0 --region <region>  --node-pool <app1nodepool2>

Scale Up


#######################################
## Resize Node size to 1 for each node pool
#######################################

gcloud container clusters update <app1-cluster> --num-nodes 1 --region <region>  --node-pool <app1nodepool1>

gcloud container clusters update <app1-cluster> --num-nodes 1 --region <region>  --node-pool <app1nodepool2>


#######################################
##  Turn Autoscaling Back on
#######################################

gcloud container clusters update <app1-cluster> --enable-autoscaling --region <region>  --node-pool <app1nodepool1>

gcloud container clusters update <app1-cluster> --enable-autoscaling --region <region>  --node-pool <app1nodepool2>

#####################################################
## Copy Saved Deployment State from GCS bucket
#####################################################

gsutil cp  gs://<app1>/deploy_state_before_scale.txt . 



#####################################################
## Scale deployments using the previously saved state file
#####################################################

awk '{print \$1,\$2,\$4}' deploy_state_before_scale.txt | while read NS DEPLOY SCALE; do kubectl scale --replicas=\$SCALE deployment/\$DEPLOY -n \$NS; done


#####################################################
## Scale Daemons back up
#####################################################

kubectl -n <namespace> patch daemonset <name-of-daemon-set> --type json -p='[{"op": "remove", "path": "/spec/template/spec/nodeSelector/non-existing"}]'

JaredBusch

@irj Sorry, that is a subject i don't have experience in.

1337

@irj
Interesting, I know nothing but aren't you using the cluster autoscaler?

It's supposed to scale up and down automatically as needed with the settings you give it. If it doesn't scale down as far as you like, have a look at the settings.

stacksofplates

@pete-s said in GKE Auto Scaling down to shut down resource usage and save costs.:

@irj
Interesting, I know nothing but aren't you using the cluster autoscaler?

It's supposed to scale up and down automatically as needed with the settings you give it. If it doesn't scale down as far as you like, have a look at the settings.

Autoscaling depends on the apps. If your app can't withstand a shutdown it's not a good idea. When more nodes are added the scheduler might move the pod to a different machine.

stacksofplates

I know we talked about this but for everyone, one thing that can help here is having multiple node pools. You can have a node pool for the database that has a certain node size and then one pool for the applications that can be a different size. The application node pool can then be scaled down to zero if needed and then bumped back up.

1337

@stacksofplates said in GKE Auto Scaling down to shut down resource usage and save costs.:

@pete-s said in GKE Auto Scaling down to shut down resource usage and save costs.:

@irj
Interesting, I know nothing but aren't you using the cluster autoscaler?

It's supposed to scale up and down automatically as needed with the settings you give it. If it doesn't scale down as far as you like, have a look at the settings.

Autoscaling depends on the apps. If your app can't withstand a shutdown it's not a good idea. When more nodes are added the scheduler might move the pod to a different machine.

Yes, but that is why you have settings. How far you want to be able to scale down and how far you want to be able to scale up.

But I don't know much about it though except what I've picked up from videos like the one below:

Youtube Video

stacksofplates

@pete-s said in GKE Auto Scaling down to shut down resource usage and save costs.:

@stacksofplates said in GKE Auto Scaling down to shut down resource usage and save costs.:

@pete-s said in GKE Auto Scaling down to shut down resource usage and save costs.:

@irj
Interesting, I know nothing but aren't you using the cluster autoscaler?

It's supposed to scale up and down automatically as needed with the settings you give it. If it doesn't scale down as far as you like, have a look at the settings.

Autoscaling depends on the apps. If your app can't withstand a shutdown it's not a good idea. When more nodes are added the scheduler might move the pod to a different machine.

Yes, but that is why you have settings. How far you want to be able to scale down and how far you want to be able to scale up.

But I don't know much about it though except what I've picked up from videos like the one below:

It's not really about cluster settings. Even adding one node can cause a pod to be rescheduled. Then the node has to download the container and start the app which could take quite a bit of time depending on your container size. The point was the only time to enable autoscaling is if you know your app can handle interruptions.

IRJ

Scale Down

######################################
## Save Deployment State (excludes kube,mongo,k8 pods)
######################################

kubectl get deploy -A --no-headers | grep -v -E 'kube|mongo|k8s-api-proxy' > deploy_state_before_scale.txt

######################################
## Copy Deployment State to GCS Bucket
######################################

gsutil cp deploy_state_before_scale.txt gs://app1

#######################################
## Scale Deployments to zero
#######################################

kubectl get deploy -A --no-headers | grep -v -E 'kube|mongo|k8s-api-proxy' | awk '{print \$1,\$2}' | while read NS DEPLOY; do  kubectl scale --replicas=0 deployment/\$DEPLOY -n \$NS; done

#######################################
## Scale Daemons to zero
#######################################
kubectl -n <namespace> patch daemonset <name-of-daemon-set> -p '{"spec": {"template": {"spec": {"nodeSelector": {"non-existing": "true"}}}}}'


#######################################
## Turn off Autoscaler on GKE nodepools
#######################################

gcloud container clusters update <app1-cluster> --no-enable-autoscaling --region <region>  --node-pool <app1nodepool1>

gcloud container clusters update <app1-cluster> --no-enable-autoscaling --region <region>  --node-pool <app1nodepool2>


#######################################
## Resize Node Pools to zero
#######################################

gcloud container clusters update <app1-cluster> --num-nodes 0 --region <region>  --node-pool <app1nodepool1>

gcloud container clusters update <app1-cluster> --num-nodes 0 --region <region>  --node-pool <app1nodepool2>

Scale Up


#######################################
## Resize Node size to 1 for each node pool
#######################################

gcloud container clusters update <app1-cluster> --num-nodes 1 --region <region>  --node-pool <app1nodepool1>

gcloud container clusters update <app1-cluster> --num-nodes 1 --region <region>  --node-pool <app1nodepool2>


#######################################
##  Turn Autoscaling Back on
#######################################

gcloud container clusters update <app1-cluster> --enable-autoscaling --region <region>  --node-pool <app1nodepool1>

gcloud container clusters update <app1-cluster> --enable-autoscaling --region <region>  --node-pool <app1nodepool2>

#####################################################
## Copy Saved Deployment State from GCS bucket
#####################################################

gsutil cp  gs://<app1>/deploy_state_before_scale.txt . 



#####################################################
## Scale deployments using the previously saved state file
#####################################################

awk '{print \$1,\$2,\$4}' deploy_state_before_scale.txt | while read NS DEPLOY SCALE; do kubectl scale --replicas=\$SCALE deployment/\$DEPLOY -n \$NS; done


#####################################################
## Scale Daemons back up
#####################################################

kubectl -n <namespace> patch daemonset <name-of-daemon-set> --type json -p='[{"op": "remove", "path": "/spec/template/spec/nodeSelector/non-existing"}]'

Solved GKE Auto Scaling down to shut down resource usage and save costs.