K8s自动扩容和自愈

‍‍‍

自动自愈

什么是自动自愈？

当业务进程意外中断，或者节点产生故障时，系统可以快速识别，自动重启并恢复服务。
自愈能够自动转移故障，也就是让业务不健康的节点不接收流量，保证用户体验。

通过自动自愈摆脱 7*24 小时 Oncall😀

‍

首先，通过 kubectl create deployment 创建另一种工作负载类型：Deployment。

例如

1 2	yihui.li@yihuilideMBP k8sdemo % kubectl create deployment hello-docker-flask --image=kalosora/hello-docker-flask:latest --replicas=2 deployment.apps/hello-docker-flask created

hello-docker-flask，代表工作负载的名称
– image，代表镜像名称
–replicas，代表pod的副本数
还可以添加 –dry-run=client 和 -o 参数，单纯输出 Manifest 内容

本质上，这条命令会生成 Deployment Manifest，然后自动执行 kubectl apply 将 Manifest 应用到集群内，省略了我们手动编写 Manifest 的过程

‍

然后，使用 kubectl create service 命令创建 Service：

1
2
3

yihui.li@yihuilideMBP k8sdemo % kubectl create service clusterip hello-docker-flask --tcp=5000:5000
service/hello-docker-flask created
yihui.li@yihuilideMBP k8sdemo %

‍

此外，使用 kubectl create ingress 命令创建 Ingress：

1	kubectl create ingress hello-docker-flask --rule="/=hello-docker-flask:5000"

‍

最后，部署Ingress-nginx

ingress-nginx

yihui.li@yihuilideMBP k8sdemo % kubectl create -f https://ghfast.top/https://raw.githubusercontent.com/lyzhang1999/resource/main/ingress-nginx/ingress-nginx.yaml
namespace/ingress-nginx created
serviceaccount/ingress-nginx created
serviceaccount/ingress-nginx-admission created
role.rbac.authorization.k8s.io/ingress-nginx created
role.rbac.authorization.k8s.io/ingress-nginx-admission created
clusterrole.rbac.authorization.k8s.io/ingress-nginx created
clusterrole.rbac.authorization.k8s.io/ingress-nginx-admission created
rolebinding.rbac.authorization.k8s.io/ingress-nginx created
rolebinding.rbac.authorization.k8s.io/ingress-nginx-admission created
clusterrolebinding.rbac.authorization.k8s.io/ingress-nginx created
clusterrolebinding.rbac.authorization.k8s.io/ingress-nginx-admission created
configmap/ingress-nginx-controller created
service/ingress-nginx-controller created
service/ingress-nginx-controller-admission created
deployment.apps/ingress-nginx-controller created
job.batch/ingress-nginx-admission-create created
job.batch/ingress-nginx-admission-patch created
ingressclass.networking.k8s.io/nginx created
validatingwebhookconfiguration.admissionregistration.k8s.io/ingress-nginx-admission created
yihui.li@yihuilideMBP k8sdemo %

‍

执行上述步骤以后：

Pod 会被 Deployment 工作负载管理起来，例如创建和销毁等；
Service 相当于弹性伸缩组的负载均衡器，它能以加权轮训的方式将流量转发到多个 Pod 副本上；
Ingress 相当于集群的外网访问入口。

‍

开始K8s自愈实验。

查看现有的pod

有了 Ingress-nginx，我们访问 Pod 就不再需要进行端口转发了，我们可以直接访问 127.0.0.1。下面的命令会每隔 1 秒钟发送一次请求，并打印出时间和返回内容：

yihui.li@yihuilideMBP k8sdemo % while true; do sleep 1; curl http://127.0.0.1; echo -e '\n'$(date);done
Hello, my first docker images! hello-docker-flask-5dccc98654-txhbk
2025年 3月15日 星期六 22时42分21秒 CST
Hello, my first docker images! hello-docker-flask-5dccc98654-ccjnc
2025年 3月15日 星期六 22时42分22秒 CST
Hello, my first docker images! hello-docker-flask-5dccc98654-ccjnc
2025年 3月15日 星期六 22时42分23秒 CST
Hello, my first docker images! hello-docker-flask-5dccc98654-txhbk
2025年 3月15日 星期六 22时42分24秒 CST
Hello, my first docker images! hello-docker-flask-5dccc98654-txhbk
2025年 3月15日 星期六 22时42分25秒 CST
Hello, my first docker images! hello-docker-flask-5dccc98654-ccjnc
2025年 3月15日 星期六 22时42分26秒 CST
Hello, my first docker images! hello-docker-flask-5dccc98654-ccjnc
2025年 3月15日 星期六 22时42分27秒 CST
Hello, my first docker images! hello-docker-flask-5dccc98654-txhbk
2025年 3月15日 星期六 22时42分28秒 CST
Hello, my first docker images! hello-docker-flask-5dccc98654-ccjnc
2025年 3月15日 星期六 22时42分29秒 CST
Hello, my first docker images! hello-docker-flask-5dccc98654-ccjnc
2025年 3月15日 星期六 22时42分30秒 CST

这里可以发现，请求被分配到了两个 Pod 上，Pod 名称是交替出现的。

‍

模拟其中的一个 Pod 宕机，观察返回内容。

开启新的命令行窗口，终止其中一个pod的python进程

1 2	kubectl exec -it hello-docker-flask-5dccc98654-ccjnc -- bash -c "killall python3"

‍

等待几秒钟后可以看见，所有的流量都被转发到了 hello-docker-flask-5dccc98654-txhbk

紧接着，hello-docker-flask-5dccc98654-ccjnc也恢复了服务

‍

自动扩容

自动扩容依赖于 K8s Metric Server 提供的监控指标，首先我们需要安装它：

metrics.yaml

yihui.li@yihuilideMBP k8sdemo % kubectl apply -f https://ghfast.top/https://raw.githubusercontent.com/lyzhang1999/resource/main/metrics/metrics.yaml
serviceaccount/metrics-server created
clusterrole.rbac.authorization.k8s.io/system:aggregated-metrics-reader created
clusterrole.rbac.authorization.k8s.io/system:metrics-server created
rolebinding.rbac.authorization.k8s.io/metrics-server-auth-reader created
clusterrolebinding.rbac.authorization.k8s.io/metrics-server:system:auth-delegator created
clusterrolebinding.rbac.authorization.k8s.io/system:metrics-server created
service/metrics-server created
deployment.apps/metrics-server created
apiservice.apiregistration.k8s.io/v1beta1.metrics.k8s.io created

‍

安装完成后，等待 Metric 工作负载就绪：

1
2
3

yihui.li@yihuilideMBP k8sdemo % kubectl wait deployment -n kube-system metrics-server --for condition=Available=True --timeout=90s
deployment.apps/metrics-server condition met
yihui.li@yihuilideMBP k8sdemo %

‍

Metric Server 就绪后，通过 kubectl autoscale 命令来为 Deployment 创建自动扩容策略：

1
2
3

yihui.li@yihuilideMBP k8sdemo % kubectl autoscale deployment hello-docker-flask --cpu-percent=50 --min=2 --max=10
horizontalpodautoscaler.autoscaling/hello-docker-flask autoscaled
yihui.li@yihuilideMBP k8sdemo %

–cpu-percent 表示 CPU 使用率阈值，当 CPU 超过 50% 时将进行自动扩容；

–min 代表最小的 Pod 副本数；

–max 代表最大扩容的副本数。也就是说，自动扩容会根据 CPU 的使用率在 2 个副本和 10 个副本之间进行扩缩容

‍

最后，要使自动扩容生效，还需要为刚才部署的 hello-docker-flask Deployment 设置资源配额。可以通过下面的命令来配置：

1
2
3

yihui.li@yihuilideMBP k8sdemo % kubectl patch deployment hello-docker-flask --type='json' -p='[{"op": "add", "path": "/spec/template/spec/containers/0/resources", "value": {"requests": {"memory": "100Mi", "cpu": "100m"}}}]'
deployment.apps/hello-docker-flask patched
yihui.li@yihuilideMBP k8sdemo %

‍

现在，Deployment 将会重新创建两个新的 Pod，可以使用下面的命令筛选出新的 Pod：

命令如下

yihui.li@yihuilideMBP k8sdemo % kubectl get pod --field-selector=status.phase==Running
NAME                                  READY   STATUS    RESTARTS   AGE
hello-docker-flask-7b8f894675-bvnj6   1/1     Running   0          81s
hello-docker-flask-7b8f894675-pn5br   1/1     Running   0          86s
yihui.li@yihuilideMBP k8sdemo %

‍

选择一个 Pod 并使用 kubectl exec 进入到容器内。模拟业务高峰期场景，使用 ab 命令来创建并发请求：

root@hello-docker-flask-7b8f894675-bvnj6:/app#
root@hello-docker-flask-7b8f894675-bvnj6:/app# ab -c 50 -n 10000 http://127.0.0.1:5000/
This is ApacheBench, Version 2.3 <$Revision: 1913912 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking 127.0.0.1 (be patient)

‍

开启新的命令行窗口，持续监控 Pod 的状态，–watch 参数会一直等待

1	kubectl get pods --watch

‍

可以观察到K8s已经感知到了Pod的压力，并且正在自动横向扩容

‍

实验的最后，执行 kind delete cluster 来删除集群：

Last login: Sun Mar 16 09:54:59 on ttys001
(base) yihui.li@yihuilideMBP ~ % kind delete cluster
Deleting cluster "kind" ...
Deleted nodes: ["kind-control-plane"]