问题描述:

1. K8S集群有一个worker,经常磁盘满,然后导致服务异常。 

SRE实战 互联网时代守护先锋,助力企业售后服务体系运筹帷幄!一键直达领取阿里云限量特价优惠。

2. 查看/var/log/syslog, 发现非常多的异常如下:

1568405.455565] docker0: port 2(vethfd09262) entered forwarding state [1568490.807194] aufs au_opts_verify:1612:docker[22618]: dirperm1 breaks the protection by the permission bits on the lower branch [1568490.839695] aufs au_opts_verify:1612:docker[25041]: dirperm1 breaks the protection by the permission bits on the lower branch

3. 从/var/log/kern.log中查到以下异常:

SLUB: Unable to allocate memory on node -1 (gfp=0x2080020)
Mar 31 18:52:08 AQA-Worker-CLD kernel: [292333.759874] cache: nf_conntrack_12(1847:58cc5f8478f68d01290885da9a59e974cf0d4575d5b92047bea0c7fd5f82130f), object size: 312, buffer size: 320, default order: 1, min order: 0

 

原因:

AUFS不稳定,导致docker删除instance的时候不能正常删除,从docker ps上看container已经删除掉了,但系统资源并没有释放,导致磁盘使用持续上升。

参考:https://codeday.me/bug/20181115/395036.html

docker info 

Containers: 0
Images: 0
Storage Driver: aufs
 Backing Filesystem: xfs
 Supports d_type: true Native Overlay Diff: true <output truncated>

 

解决方法:

1. sudo systemctl stop docker

2. mv  /var/lib/docker /var/lib/docker.bk

3. vim /etc/docker/daemon.json

{ "storage-driver": "overlay2" }

4. systemctl restart docker
5. docker info :
Containers: 0
Images: 0
Storage Driver: overlay2
 Backing Filesystem: xfs
 Supports d_type: true Native Overlay Diff: true <output truncated>

参考:https://docs.docker.com/storage/storagedriver/overlayfs-driver/

扫码关注我们
微信号:SRE实战
拒绝背锅 运筹帷幄