Openshift中Pod的SpringBoot2应用程序健康检查

1. 准备测试的SpringBoot工程, 需要Java 8 JDK or greater and Maven 3.3.x or greater.

git clone https://github.com/megadotnet/Openshift-healthcheck-demo.git

SRE实战 互联网时代守护先锋,助力企业售后服务体系运筹帷幄!一键直达领取阿里云限量特价优惠。

假设您已经掌握基本JAVA应用程序开发,Openshift容器平台已经部署成功。我们的测试工程依赖库Spring Boot Actuator2, 有如新特性

    • 支持 Jersey RESTful Web 服务
    • 支持基于反应式理念的 WebFlux Web App
    • 新的端点映射
    • 简化用户自定义端点的创建
    • 增强端点的安全性

Actuator提供了13个接口,如下:

Openshift中Pod的SpringBoot2健康检查 Cloud 第1张

在Spring Boot 2.x中为了安全起见,Actuator只开放了两个端点/actuator/health和/actuator/info,可以在配置文件中设置开关。

部署编译jar文件到Openshift容器平台

Openshift的部署过程简述如下(这里采用是二进制部署方式):

--> Found image dc046fe (16 months old) in image stream "openshift/s2i-java" under tag "latest" for "s2i-java:latest"

Java S2I builder 1.0

--------------------

Platform for building Java (fatjar) applications with maven or gradle

Tags: builder, maven-3, gradle-2.6, java, microservices, fatjar

* A source build using binary input will be created

* The resulting image will be pushed to image stream "health-demo:latest"

* A binary build was created, use 'start-build --from-dir' to trigger a new build

--> Creating resources with label app=health-demo ...

imagestream "health-demo" created

buildconfig "health-demo" created

--> Success

Uploading directory "oc-build" as binary input for the build ...

build "health-demo-1" started

--> Found image fb46616 (5 minutes old) in image stream "hshreport-stage/health-demo" under tag "latest" for "health-demo:latest"

Java S2I builder 1.0

--------------------

Platform for building Java (fatjar) applications with maven or gradle

Tags: builder, maven-3, gradle-2.6, java, microservices, fatjar

* This image will be deployed in deployment config "health-demo"

* Ports 7575/tcp, 8080/tcp will be load balanced by service "health-demo"

* Other containers can access this service through the hostname "health-demo"

--> Creating resources with label app=health-demo ...

deploymentconfig "health-demo" created

service "health-demo" created

--> Success

Run 'oc status' to view your app.

route "health-demo" exposed

以上过程最近暴露Router, 方便我们演示


演示过程:

第一回合

修改刚部署的deploymentConfig的yaml,增加readiness配置,如下:

---
readinessProbe:
   failureThreshold: 3
   httpGet:
     path: /actuator/health
     port: 8080
     scheme: HTTP
   initialDelaySeconds: 10
   periodSeconds: 10
   successThreshold: 1
   timeoutSeconds: 1

对于以几个参数,我们说明下,大家需要理解

      • initialDelaySeconds:容器启动后第一次执行探测是需要等待多少秒。
      • periodSeconds:执行探测的频率。默认是10秒,最小1秒。
      • timeoutSeconds:探测超时时间。默认1秒,最小1秒。
      • successThreshold:探测失败后,最少连续探测成功多少次才被认定为成功。默认是 1。对于 liveness 必须是 1。最小值是 1。
      • failureThreshold:探测成功后,最少连续探测失败多少次才被认定为失败。默认是 3。最小值是 1。

或是采用openshift cli命令行来配置readiness:

oc set probe dc/app-cli \

--readiness \

--get-url=http://:8080/notreal \

--initial-delay-seconds=5


$ oc get pod -w

#刚才修改 deploymentConfig,pod重新部署了

NAME READY                 STATUS   RESTARTS  AGE

health-demo-1-build 0/1 Completed    0        16m

health-demo-2-sqh4z 1/1 Running      0        11m

执行HTTP API 来STOP停止Tomcat,curl http://${value-name-app}-MY_PROJECT_NAME.LOCAL_OPENSHIFT_HOSTNAME/api/stop,   注意此处URL相部署的DNS环境有关系

程序中日志如下:

Stopping Tomcat context.

2020-01-11 22:17:21.004 INFO 1 --- [nio-8080-exec-9] o.apache.catalina.core.StandardWrapper : Waiting for [1] instance(s) to be deallocated for Servlet [dispatcherServlet]

2020-01-11 22:17:22.008 INFO 1 --- [nio-8080-exec-9] o.apache.catalina.core.StandardWrapper : Waiting for [1] instance(s) to be deallocated for Servlet [dispatcherServlet]

2020-01-11 22:17:23.012 INFO 1 --- [nio-8080-exec-9] o.apache.catalina.core.StandardWrapper : Waiting for [1] instance(s) to be deallocated for Servlet [dispatcherServlet]

2020-01-11 22:17:23.114 INFO 1 --- [nio-8080-exec-9] o.a.c.c.C.[Tomcat].[localhost].[/] : Destroying Spring FrameworkServlet 'dispatcherServlet'

#跟踪pod的动态

$ oc get pod –w

NAME READY STATUS RESTARTS AGE

health-demo-1-build 0/1 Completed 0 16m

health-demo-2-sqh4z 1/1 Running 0 11m

health-demo-2-sqh4z 0/1 Running 0 13m

#我们查阅pod的详细描述

$ oc describe pod/health-demo-2-sqh4z

Name: health-demo-2-sqh4z

Namespace: hshreport-stage

Security Policy: restricted

Node: openshift-lb-02.hsh.io/10.108.78.145

Start Time: Sat, 11 Jan 2020 22:08:59 +0800

Labels: app=health-demo

deployment=health-demo-2

deploymentconfig=health-demo

Annotations: kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicationController","namespace":"hshreport-stage","name":"health-demo-2","uid":"e6436263-347b-11ea-856c...

openshift.io/deployment-config.latest-version=2

openshift.io/deployment-config.name=health-demo

openshift.io/deployment.name=health-demo-2

openshift.io/generated-by=OpenShiftNewApp

openshift.io/scc=restricted

Status: Running

IP: 10.131.5.124

Controllers: ReplicationController/health-demo-2

Containers:

health-demo:

Container ID: docker://25cdf63f55d839610287b4e2a3cc67182377bfe5010990357f83329286c7e64f

Image: docker-registry.default.svc:5000/hshreport-stage/health-demo@sha256:292f09b7d9ca9bc12560febe3f4ba73e50b3c1a5701cbd55689186e844157fb6

Image ID: docker-pullable://docker-registry.default.svc:5000/hshreport-stage/health-demo@sha256:292f09b7d9ca9bc12560febe3f4ba73e50b3c1a5701cbd55689186e844157fb6

Ports: 7575/TCP, 8080/TCP

State: Running

Started: Sat, 11 Jan 2020 22:09:09 +0800

Ready: False

Restart Count: 0

Readiness: http-get http://:8080/actuator/health delay=10s timeout=1s period=10s #success=1 #failure=3

Environment:

APP_OPTIONS: -Xmx512m -Xss512k -Djava.net.preferIPv4Stack=true -Dfile.encoding=utf-8

DEPLOYER: liu.xxxxx (Administrator) (cicd-1.1.24)

REVISION:

SPRING_PROFILES_ACTIVE: stage

TZ: Asia/Shanghai

Mounts:

/var/run/secrets/kubernetes.io/serviceaccount from default-token-n4klp (ro)

Conditions:

Type Status

Initialized True

Ready False

PodScheduled True

Volumes:

default-token-n4klp:

Type: Secret (a volume populated by a Secret)

SecretName: default-token-n4klp

Optional: false

QoS Class: BestEffort

Node-Selectors: region=primary

Tolerations: <none>

Events:

FirstSeen LastSeen Count From SubObjectPath Type Reason Message

--------- -------- ----- ---- ------------- -------- ------ -------

16m 16m 1 default-scheduler Normal Scheduled Successfully assigned health-demo-2-sqh4z to openshift-lb-02.hsh.io

16m 16m 1 kubelet, openshift-lb-02.hsh.io spec.containers{health-demo} Normal Pulling pulling image "docker-registry.default.svc:5000/hshreport-stage/health-demo@sha256:292f09b7d9ca9bc12560febe3f4ba73e50b3c1a5701cbd55689186e844157fb6"

16m 16m 1 kubelet, openshift-lb-02.hsh.io spec.containers{health-demo} Normal Pulled Successfully pulled image "docker-registry.default.svc:5000/hshreport-stage/health-demo@sha256:292f09b7d9ca9bc12560febe3f4ba73e50b3c1a5701cbd55689186e844157fb6"

15m 15m 1 kubelet, openshift-lb-02.hsh.io spec.containers{health-demo} Normal Created Created container

15m 15m 1 kubelet, openshift-lb-02.hsh.io spec.containers{health-demo} Normal Started Started container

15m 15m 1 kubelet, openshift-lb-02.hsh.io spec.containers{health-demo} Warning Unhealthy Readiness probe failed: Get http://10.131.5.124:8080/actuator/health: dial tcp 10.131.5.124:8080: getsockopt: connection refused

7m 5m 16 kubelet, openshift-lb-02.hsh.io spec.containers{health-demo} Warning Unhealthy Readiness probe failed: HTTP probe failed with statuscode: 404

注意上面的WARN事件,我们发现POD没有被重启,因为我们只配置了readiness

第二回合

对之前部署后应用,增加health check, /actuator/health 是SpringBoot2.0的示例工程默认的健康检查的端点

修改deploymentConfig,增加readiness与liveness

---
livenessProbe:
   failureThreshold: 3
   httpGet:
     path: /actuator/health
     port: 8080
     scheme: HTTP
   initialDelaySeconds: 60
   periodSeconds: 10
   successThreshold: 1
   timeoutSeconds: 1
name: health-demo
ports:
   -
     containerPort: 7575
     protocol: TCP
   -
     containerPort: 8080
     protocol: TCP
readinessProbe:
   failureThreshold: 3
   httpGet:
     path: /actuator/health
     port: 8080
     scheme: HTTP
   initialDelaySeconds: 10
   periodSeconds: 10
   successThreshold: 1
   timeoutSeconds: 1

也可以通过Web 控制台来修改,示例截图如下:

Openshift中Pod的SpringBoot2健康检查 Cloud 第2张

Openshift中Pod的SpringBoot2健康检查 Cloud 第3张

我们看到 WEB UI的配置与yaml中参数是一样的。

oc cli命令行方法 #Configure Liveness/Readiness probes on DCs

oc set probe dc cotd1 --liveness -- echo ok

oc set probe dc/cotd1 --readiness --get-url=http://:8080/index.php --initial-delay-seconds=2

TCP的示例

oc set probe dc/blog --readiness --liveness --open-tcp 8080

移动 probe

$ oc set probe dc/blog --readiness --liveness –remove

执行stop后,此时请求ROUTER已显示,POD还在运行,浏览器返回

Application is not available

通过URL执行后,STOP tomcat,pod中程序部分日志如下:

Stopping Tomcat context.

2020-01-11 22:17:21.004 INFO 1 --- [nio-8080-exec-9] o.apache.catalina.core.StandardWrapper : Waiting for [1] instance(s) to be deallocated for Servlet [dispatcherServlet]

2020-01-11 22:17:22.008 INFO 1 --- [nio-8080-exec-9] o.apache.catalina.core.StandardWrapper : Waiting for [1] instance(s) to be deallocated for Servlet [dispatcherServlet]

2020-01-11 22:17:23.012 INFO 1 --- [nio-8080-exec-9] o.apache.catalina.core.StandardWrapper : Waiting for [1] instance(s) to be deallocated for Servlet [dispatcherServlet]

2020-01-11 22:17:23.114 INFO 1 --- [nio-8080-exec-9] o.a.c.c.C.[Tomcat].[localhost].[/] : Destroying Spring FrameworkServlet 'dispatcherServlet'

过一会儿,我们监控pod动态

$ oc get pod -w

NAME READY STATUS RESTARTS AGE

health-demo-1-build 0/1 Completed 0 33m

health-demo-3-02v11 1/1 Running 0 5m

health-demo-3-02v11 0/1 Running 0 7m

health-demo-3-02v11 0/1 Running 1 7m

$ oc get pod

NAME READY STATUS RESTARTS AGE

health-demo-1-build 0/1 Completed 0 36m

health-demo-3-02v11 1/1 Running 1 8m


请求 curl http://${value-name-app}-MY_PROJECT_NAME.LOCAL_OPENSHIFT_HOSTNAME/api/greeting?name=s2i

后浏览器显示:

{"content":"Hello, s2i!"} (the recovery took 41.783 seconds)

此时POD已经被重启,应用程序的日志如下:

2020-01-11 22:35:13.597 INFO 1 --- [ main] s.b.a.e.w.s.WebMvcEndpointHandlerMapping : Mapped "{[/actuator],methods=[GET],produces=[application/vnd.spring-boot.actuator.v2+json || application/json]}" onto protected java.util.Map<java.lang.String, java.util.Map<java.lang.String, org.springframework.boot.actuate.endpoint.web.Link>> org.springframework.boot.actuate.endpoint.web.servlet.WebMvcEndpointHandlerMapping.links(javax.servlet.http.HttpServletRequest,javax.servlet.http.HttpServletResponse)

2020-01-11 22:35:13.750 INFO 1 --- [ main] o.s.j.e.a.AnnotationMBeanExporter : Registering beans for JMX exposure on startup

2020-01-11 22:35:13.873 INFO 1 --- [ main] o.s.b.w.embedded.tomcat.TomcatWebServer : Tomcat started on port(s): 8080 (http) with context path ''

2020-01-11 22:35:13.882 INFO 1 --- [ main] dev.snowdrop.example.ExampleApplication : Started ExampleApplication in 8.061 seconds (JVM running for 9.682)

2020-01-11 22:35:22.445 INFO 1 --- [nio-8080-exec-1] o.a.c.c.C.[Tomcat].[localhost].[/] : Initializing Spring FrameworkServlet 'dispatcherServlet'

2020-01-11 22:35:22.445 INFO 1 --- [nio-8080-exec-1] o.s.web.servlet.DispatcherServlet : FrameworkServlet 'dispatcherServlet': initialization started

2020-01-11 22:35:22.485 INFO 1 --- [nio-8080-exec-1] o.s.web.servlet.DispatcherServlet : FrameworkServlet 'dispatcherServlet': initialization completed in 39 ms

$ oc describe pod/health-demo-3-02v11

Name: health-demo-3-02v11

Namespace: hshreport-stage

Security Policy: restricted

Node: openshift-node-04.hsh.io/10.108.78.139

Start Time: Sat, 11 Jan 2020 22:32:12 +0800

Labels: app=health-demo

deployment=health-demo-3

deploymentconfig=health-demo

Annotations: kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicationController","namespace":"hshreport-stage","name":"health-demo-3","uid":"23ad2f21-347f-11ea-856c...

openshift.io/deployment-config.latest-version=3

openshift.io/deployment-config.name=health-demo

openshift.io/deployment.name=health-demo-3

openshift.io/generated-by=OpenShiftNewApp

openshift.io/scc=restricted

Status: Running

IP: 10.129.5.178

Controllers: ReplicationController/health-demo-3

Containers:

health-demo:

Container ID: docker://3e5a6b081022c914d8e118dce829294570e54f441b84394a2b13f6eebb4f5c74

Image: docker-registry.default.svc:5000/hshreport-stage/health-demo@sha256:292f09b7d9ca9bc12560febe3f4ba73e50b3c1a5701cbd55689186e844157fb6

Image ID: docker-pullable://docker-registry.default.svc:5000/hshreport-stage/health-demo@sha256:292f09b7d9ca9bc12560febe3f4ba73e50b3c1a5701cbd55689186e844157fb6

Ports: 7575/TCP, 8080/TCP

State: Running

Started: Sat, 11 Jan 2020 22:35:04 +0800

Last State: Terminated

Reason: Error

Exit Code: 143

Started: Sat, 11 Jan 2020 22:32:15 +0800

Finished: Sat, 11 Jan 2020 22:35:03 +0800

Ready: True

Restart Count: 1

Liveness: http-get http://:8080/actuator/health delay=60s timeout=1s period=10s #success=1 #failure=3

Readiness: http-get http://:8080/actuator/health delay=10s timeout=1s period=10s #success=1 #failure=3

Environment:

APP_OPTIONS: -Xmx512m -Xss512k -Djava.net.preferIPv4Stack=true -Dfile.encoding=utf-8

DEPLOYER: liu.xxxxxx(Administrator) (cicd-1.1.24)

REVISION:

SPRING_PROFILES_ACTIVE: stage

TZ: Asia/Shanghai

Mounts:

/var/run/secrets/kubernetes.io/serviceaccount from default-token-n4klp (ro)

Conditions:

Type Status

Initialized True

Ready True

PodScheduled True

Volumes:

default-token-n4klp:

Type: Secret (a volume populated by a Secret)

SecretName: default-token-n4klp

Optional: false

QoS Class: BestEffort

Node-Selectors: region=primary

Tolerations: <none>

Events:

FirstSeen LastSeen Count From SubObjectPath Type Reason Message

--------- -------- ----- ---- ------------- -------- ------ -------

17m 17m 1 default-scheduler Normal Scheduled Successfully assigned health-demo-3-02v11 to openshift-node-04.hsh.io

15m 14m 3 kubelet, openshift-node-04.hsh.io spec.containers{health-demo} Warning Unhealthy Liveness probe failed: HTTP probe failed with statuscode: 404

15m 14m 3 kubelet, openshift-node-04.hsh.io spec.containers{health-demo} Warning Unhealthy Readiness probe failed: HTTP probe failed with statuscode: 404

17m 14m 2 kubelet, openshift-node-04.hsh.io spec.containers{health-demo} Normal Pulling pulling image "docker-registry.default.svc:5000/hshreport-stage/health-demo@sha256:292f09b7d9ca9bc12560febe3f4ba73e50b3c1a5701cbd55689186e844157fb6"

17m 14m 2 kubelet, openshift-node-04.hsh.io spec.containers{health-demo} Normal Pulled Successfully pulled image "docker-registry.default.svc:5000/hshreport-stage/health-demo@sha256:292f09b7d9ca9bc12560febe3f4ba73e50b3c1a5701cbd55689186e844157fb6"

17m 14m 2 kubelet, openshift-node-04.hsh.io spec.containers{health-demo} Normal Created Created container

14m 14m 1 kubelet, openshift-node-04.hsh.io spec.containers{health-demo} Normal Killing Killing container with id docker://health-demo:pod "health-demo-3-02v11_hshreport-stage(27e5a1da-347f-11ea-856c-0050568d3d78)" container "health-demo" is unhealthy, it will be killed and re-created.

17m 14m 2 kubelet, openshift-node-04.hsh.io spec.containers{health-demo} Normal Started Started container

第二次我们执行 STOP的HTTP  API

$ oc get pod –w

NAME READY STATUS RESTARTS AGE

health-demo-1-build 0/1 Completed 0 47m

health-demo-3-02v11 1/1 Running 1 19m

health-demo-3-02v11 0/1 Running 1 19m

health-demo-3-02v11 0/1 Running 2 20m

health-demo-3-02v11 1/1 Running 2 20m

$ oc get pod

NAME READY STATUS RESTARTS AGE

health-demo-1-build 0/1 Completed 0 49m

health-demo-3-02v11 1/1 Running 2 21

HTTP 请求返回结果:

{"content":"Hello, s2i!"} (the recovery took 51.984 seconds)

$ oc describe pod/health-demo-3-02v11

Name: health-demo-3-02v11

Namespace: hshreport-stage

Security Policy: restricted

Node: openshift-node-04.hsh.io/10.108.78.139

Start Time: Sat, 11 Jan 2020 22:32:12 +0800

Labels: app=health-demo

deployment=health-demo-3

deploymentconfig=health-demo

Annotations: kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicationController","namespace":"hshreport-stage","name":"health-demo-3","uid":"23ad2f21-347f-11ea-856c...

openshift.io/deployment-config.latest-version=3

openshift.io/deployment-config.name=health-demo

openshift.io/deployment.name=health-demo-3

openshift.io/generated-by=OpenShiftNewApp

openshift.io/scc=restricted

Status: Running

IP: 10.129.5.178

Controllers: ReplicationController/health-demo-3

Containers:

health-demo:

Container ID: docker://e12d1975aa26b07643ae1666ae6bce7ceab4f25fb4c6c947427ba526ad6fdf7b

Image: docker-registry.default.svc:5000/hshreport-stage/health-demo@sha256:292f09b7d9ca9bc12560febe3f4ba73e50b3c1a5701cbd55689186e844157fb6

Image ID: docker-pullable://docker-registry.default.svc:5000/hshreport-stage/health-demo@sha256:292f09b7d9ca9bc12560febe3f4ba73e50b3c1a5701cbd55689186e844157fb6

Ports: 7575/TCP, 8080/TCP

State: Running

Started: Sat, 11 Jan 2020 22:47:14 +0800

Last State: Terminated

Reason: Error

Exit Code: 143

Started: Sat, 11 Jan 2020 22:35:04 +0800

Finished: Sat, 11 Jan 2020 22:47:02 +0800

Ready: True

Restart Count: 2

Liveness: http-get http://:8080/actuator/health delay=60s timeout=1s period=10s #success=1 #failure=3

Readiness: http-get http://:8080/actuator/health delay=10s timeout=1s period=10s #success=1 #failure=3

Environment:

APP_OPTIONS: -Xmx512m -Xss512k -Djava.net.preferIPv4Stack=true -Dfile.encoding=utf-8

DEPLOYER: liu.xxxxx (Administrator) (cicd-1.1.24)

REVISION:

SPRING_PROFILES_ACTIVE: stage

TZ: Asia/Shanghai

Mounts:

/var/run/secrets/kubernetes.io/serviceaccount from default-token-n4klp (ro)

Conditions:

Type Status

Initialized True

Ready True

PodScheduled True

Volumes:

default-token-n4klp:

Type: Secret (a volume populated by a Secret)

SecretName: default-token-n4klp

Optional: false

QoS Class: BestEffort

Node-Selectors: region=primary

Tolerations: <none>

Events:

FirstSeen LastSeen Count From SubObjectPath Type Reason Message

--------- -------- ----- ---- ------------- -------- ------ -------

21m 21m 1 default-scheduler Normal Scheduled Successfully assigned health-demo-3-02v11 to openshift-node-04.hsh.io

21m 6m 3 kubelet, openshift-node-04.hsh.io spec.containers{health-demo} Normal Pulling pulling image "docker-registry.default.svc:5000/hshreport-stage/health-demo@sha256:292f09b7d9ca9bc12560febe3f4ba73e50b3c1a5701cbd55689186e844157fb6"

19m 6m 6 kubelet, openshift-node-04.hsh.io spec.containers{health-demo} Warning Unhealthy Liveness probe failed: HTTP probe failed with statuscode: 404

19m 6m 6 kubelet, openshift-node-04.hsh.io spec.containers{health-demo} Warning Unhealthy Readiness probe failed: HTTP probe failed with statuscode: 404

18m 6m 2 kubelet, openshift-node-04.hsh.io spec.containers{health-demo} Normal Killing Killing container with id docker://health-demo:pod "health-demo-3-02v11_hshreport-stage(27e5a1da-347f-11ea-856c-0050568d3d78)" container "health-demo" is unhealthy, it will be killed and re-created.

21m 6m 3 kubelet, openshift-node-04.hsh.io spec.containers{health-demo} Normal Pulled Successfully pulled image "docker-registry.default.svc:5000/hshreport-stage/health-demo@sha256:292f09b7d9ca9bc12560febe3f4ba73e50b3c1a5701cbd55689186e844157fb6"

21m 6m 3 kubelet, openshift-node-04.hsh.io spec.containers{health-demo} Normal Created Created container

21m 6m 3 kubelet, openshift-node-04.hsh.io spec.containers{health-demo} Normal Started Started container

#看下事件

$ oc get ev

LASTSEEN FIRSTSEEN COUNT NAME KIND SUBOBJECT TYPE REASON SOURCE MESSAGE

29m 41m 6 health-demo-3-02v11 Pod spec.containers{health-demo} Warning Unhealthy kubelet, openshift-node-04.hsh.io Liveness probe failed: HTTP probe failed with statuscode: 404

29m 41m 6 health-demo-3-02v11 Pod spec.containers{health-demo} Warning Unhealthy kubelet, openshift-node-04.hsh.io Readiness probe failed: HTTP probe failed with statuscode: 404

29m 41m 2 health-demo-3-02v11 Pod spec.containers{health-demo} Normal Killing kubelet, openshift-node-04.hsh.io Killing container with id docker://health-demo:pod "health-demo-3-02v11_hshreport-stage(27e5a1da-347f-11ea-856c-0050568d3d78)" container "health-demo" is unhealthy, it will be killed and re-created.

19m 19m 1 health-demo-3-02v11 Pod spec.containers{health-demo} Normal Killing kubelet, openshift-node-04.hsh.io Killing container with id docker://health-demo:Need to kill Pod

44m 44m 1 health-demo-3-deploy Pod Normal Scheduled default-scheduler Successfully assigned health-demo-3-deploy to openshift-lb-02.hsh.io

44m 44m 1 health-demo-3-deploy Pod spec.containers{deployment} Normal Pulled kubelet, openshift-lb-02.hsh.io Container image "openshift/origin-deployer:v3.6.1" already present on machine

44m 44m 1 health-demo-3-deploy Pod spec.containers{deployment} Normal Created kubelet, openshift-lb-02.hsh.io Created container

44m 44m 1 health-demo-3-deploy Pod spec.containers{deployment} Normal Started kubelet, openshift-lb-02.hsh.io Started container

44m 44m 1 health-demo-3 ReplicationController Normal SuccessfulCreate replication-controller Created pod: health-demo-3-02v11

19m 19m 1 health-demo-3 ReplicationController Normal SuccessfulDelete

发现,POD的名称health-demo-3-02v11没有变,演示到这儿结束

小结

Liveness probes可做三种检查

HTTP(S) checks—Checks a given URL endpoint served by the container, and evaluates the HTTP response code.

Container execution check—A command, typically a script, that’s run at intervals to verify that the container is behaving as expected. A non-zero exit code from the command results in a liveness check failure.

TCP socket checks—Checks that a TCP connection can be established on a specific TCP port in the application pod.


Readiness和liveness的区别

     readiness 就是意思是否可以访问,liveness就是是否存活。如果一个readiness 为fail 的后果是把这个pod 的所有service 的endpoint里面的改pod ip 删掉,意思就这个pod对应的所有service都不会把请求转到这pod来了。但是如果liveness 检查结果是fail就会直接kill container,当然如果你的restart policy 是always 会重启pod。

Readiness探针和Liveness探针都是用来检测容器进程状态的。区别在于前者关注的是是否把进程的服务地址加入Service的负载均衡列表;而后者则决定是否去重启这个进程来排除故障。它们在进程的整个生命周期中都存在而且同时工作,职责分离。

Kubelet 可以选择是否执行在容器上运行的两种探针执行和做出反应:

  • livenessProbe:指示容器是否正在运行。如果存活探测失败,则 kubelet 会杀死容器,并且容器将受到其 重启策略 的影响。如果容器不提供存活探针,则默认状态为 Success。
  • readinessProbe:指示容器是否准备好服务请求。如果就绪探测失败,端点控制器将从与 Pod 匹配的所有 Service 的端点中删除该 Pod 的 IP 地址。初始延迟之前的就绪状态默认为 Failure。如果容器不提供就绪探针,则默认状态为 Success。


最佳实践

      一般来讲,Readiness的执行间隔要比Liveness设置的较长一点比较好。因为当后端进程负载高的时候,我们可以暂时从转发列表里面摘除,但是Liveness决定的是进程是否重启,其实这个时候进程不一定需要重启。所以Liveness的检测周期可以稍微长一点,另外失败的容忍数量也可以多一点。具体根据实际情况判断吧。


今天先到这儿,希望对云原生,技术领导力, 企业管理,系统架构设计与评估,团队管理, 项目管理, 产品管管,团队建设 有参考作用 , 您可能感兴趣的文章:
Openshift部署流程介绍
Openshift V3系列各组件版本
领导人怎样带领好团队
构建创业公司突击小团队
国际化环境下系统架构演化
微服务架构设计
视频直播平台的系统架构演化
微服务与Docker介绍
Docker与CI持续集成/CD
互联网电商购物车架构演变案例
互联网业务场景下消息队列架构
互联网高效研发团队管理演进之一
消息系统架构设计演进
互联网电商搜索架构演化之一
企业信息化与软件工程的迷思
企业项目化管理介绍
软件项目成功之要素
人际沟通风格介绍一
精益IT组织与分享式领导
学习型组织与企业
企业创新文化与等级观念
组织目标与个人目标
初创公司人才招聘与管理
人才公司环境与企业文化
企业文化、团队文化与知识共享
高效能的团队建设
项目管理沟通计划
构建高效的研发与自动化运维
某大型电商云平台实践
互联网数据库架构设计思路
IT基础架构规划方案一(网络系统规划)
餐饮行业解决方案之客户分析流程
餐饮行业解决方案之采购战略制定与实施流程
餐饮行业解决方案之业务设计流程
供应链需求调研CheckList
企业应用之性能实时度量系统演变

如有想了解更多软件设计与架构, 系统IT,企业信息化, 团队管理 资讯,请关注我的微信订阅号:

Openshift中Pod的SpringBoot2健康检查 Cloud 第4张

作者:Petter Liu
出处:http://www.cnblogs.com/wintersun/
本文版权归作者和博客园共有,欢迎转载,但未经作者同意必须保留此段声明,且在文章页面明显位置给出原文连接,否则保留追究法律责任的权利。 该文章也同时发布在我的独立博客中-Petter Liu Blog。

扫码关注我们
微信号:SRE实战
拒绝背锅 运筹帷幄