docker hang 死排查之旅

背景

最近,我们在升级kubelet时,发现部分宿主机上docker出现hang死的现象,发现过程详见:docker-hang-死阻塞-kubelet-初始化流程

现在,我们聚焦在docker上,分析docker hang死问题发生时的现象、形成的原因、问题定位的方法,以及对应的解决办法。本文详细记录了整个过程。

docker hang死

我们对docker hang死并不陌生,因为已经发生了好多起。其发生时的现象也多种多样。以往针对docker 1.13.1版本的排查都发现了一些线索,但是并没有定位到根因,最终绝大多数也是通过重启docker解决。而这一次发生在docker 18.06.3版本的docker hang死行为,经过我们4人小分队接近一周的望闻问切,终于确定了其病因。注意,docker hang死的原因远不止一种,因此本排查方法与结果并不具有普适性。

在开始问题排查之前,我们先整理目前掌握的知识:

  • 特定容器异常,无法响应docker inspect操作

除此之外的信息,我们则一无所知。

当我们排查一个未知的问题时,一般的做法是先找一个切入点,然后顺藤摸瓜,逐步缩小问题排查的圈定范围,并最终在细枝末节上定位问题的所在。而本问题中,docker显然是我们唯一的切入点。

链路跟踪

首先,我们希望对docker运行的全局状况有一个大致的了解,熟悉go语言开发的用户自然能联想到调试神器pprof。我们借助pprof描绘出了docker当时运行的蓝图:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
goroutine profile: total 722373
717594 @ 0x7fe8bc202980 0x7fe8bc202a40 0x7fe8bc2135d8 0x7fe8bc2132ef 0x7fe8bc238c1a 0x7fe8bd56f7fe 0x7fe8bd56f6bd 0x7fe8bcea8719 0x7fe8bcea938b 0x7fe8bcb726ca 0x7fe8bcb72b01 0x7fe8bc71c26b 0x7fe8bcb85f4a 0x7fe8bc4b9896 0x7fe8bc72a438 0x7fe8bcb849e2 0x7fe8bc4bc67e 0x7fe8bc4b88a3 0x7fe8bc230711
# 0x7fe8bc2132ee sync.runtime_SemacquireMutex+0x3e /usr/local/go/src/runtime/sema.go:71
# 0x7fe8bc238c19 sync.(*Mutex).Lock+0x109 /usr/local/go/src/sync/mutex.go:134
# 0x7fe8bd56f7fd github.com/docker/docker/daemon.(*Daemon).ContainerInspectCurrent+0x8d /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/daemon/inspect.go:40
# 0x7fe8bd56f6bc github.com/docker/docker/daemon.(*Daemon).ContainerInspect+0x11c /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/daemon/inspect.go:29
# 0x7fe8bcea8718 github.com/docker/docker/api/server/router/container.(*containerRouter).getContainersByName+0x118 /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/router/container/inspect.go:15
# 0x7fe8bcea938a github.com/docker/docker/api/server/router/container.(*containerRouter).(github.com/docker/docker/api/server/router/container.getContainersByName)-fm+0x6a /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/router/container/container.go:39
# 0x7fe8bcb726c9 github.com/docker/docker/api/server/middleware.ExperimentalMiddleware.WrapHandler.func1+0xd9 /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/middleware/experimental.go:26
# 0x7fe8bcb72b00 github.com/docker/docker/api/server/middleware.VersionMiddleware.WrapHandler.func1+0x400 /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/middleware/version.go:62
# 0x7fe8bc71c26a github.com/docker/docker/pkg/authorization.(*Middleware).WrapHandler.func1+0x7aa /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/pkg/authorization/middleware.go:59
# 0x7fe8bcb85f49 github.com/docker/docker/api/server.(*Server).makeHTTPHandler.func1+0x199 /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/server.go:141
# 0x7fe8bc4b9895 net/http.HandlerFunc.ServeHTTP+0x45 /usr/local/go/src/net/http/server.go:1947
# 0x7fe8bc72a437 github.com/docker/docker/vendor/github.com/gorilla/mux.(*Router).ServeHTTP+0x227 /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/github.com/gorilla/mux/mux.go:103
# 0x7fe8bcb849e1 github.com/docker/docker/api/server.(*routerSwapper).ServeHTTP+0x71 /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/router_swapper.go:29
# 0x7fe8bc4bc67d net/http.serverHandler.ServeHTTP+0xbd /usr/local/go/src/net/http/server.go:2694
# 0x7fe8bc4b88a2 net/http.(*conn).serve+0x652 /usr/local/go/src/net/http/server.go:1830

4175 @ 0x7fe8bc202980 0x7fe8bc202a40 0x7fe8bc2135d8 0x7fe8bc2132ef 0x7fe8bc238c1a 0x7fe8bcc2eccf 0x7fe8bd597af4 0x7fe8bcea2456 0x7fe8bcea956b 0x7fe8bcb73dff 0x7fe8bcb726ca 0x7fe8bcb72b01 0x7fe8bc71c26b 0x7fe8bcb85f4a 0x7fe8bc4b9896 0x7fe8bc72a438 0x7fe8bcb849e2 0x7fe8bc4bc67e 0x7fe8bc4b88a3 0x7fe8bc230711
# 0x7fe8bc2132ee sync.runtime_SemacquireMutex+0x3e /usr/local/go/src/runtime/sema.go:71
# 0x7fe8bc238c19 sync.(*Mutex).Lock+0x109 /usr/local/go/src/sync/mutex.go:134
# 0x7fe8bcc2ecce github.com/docker/docker/container.(*State).IsRunning+0x2e /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/container/state.go:240
# 0x7fe8bd597af3 github.com/docker/docker/daemon.(*Daemon).ContainerStats+0xb3 /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/daemon/stats.go:30
# 0x7fe8bcea2455 github.com/docker/docker/api/server/router/container.(*containerRouter).getContainersStats+0x1e5 /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/router/container/container_routes.go:115
# 0x7fe8bcea956a github.com/docker/docker/api/server/router/container.(*containerRouter).(github.com/docker/docker/api/server/router/container.getContainersStats)-fm+0x6a /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/router/container/container.go:42
# 0x7fe8bcb73dfe github.com/docker/docker/api/server/router.cancellableHandler.func1+0xce /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/router/local.go:92
# 0x7fe8bcb726c9 github.com/docker/docker/api/server/middleware.ExperimentalMiddleware.WrapHandler.func1+0xd9 /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/middleware/experimental.go:26
# 0x7fe8bcb72b00 github.com/docker/docker/api/server/middleware.VersionMiddleware.WrapHandler.func1+0x400 /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/middleware/version.go:62
# 0x7fe8bc71c26a github.com/docker/docker/pkg/authorization.(*Middleware).WrapHandler.func1+0x7aa /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/pkg/authorization/middleware.go:59
# 0x7fe8bcb85f49 github.com/docker/docker/api/server.(*Server).makeHTTPHandler.func1+0x199 /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/server.go:141
# 0x7fe8bc4b9895 net/http.HandlerFunc.ServeHTTP+0x45 /usr/local/go/src/net/http/server.go:1947
# 0x7fe8bc72a437 github.com/docker/docker/vendor/github.com/gorilla/mux.(*Router).ServeHTTP+0x227 /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/github.com/gorilla/mux/mux.go:103
# 0x7fe8bcb849e1 github.com/docker/docker/api/server.(*routerSwapper).ServeHTTP+0x71 /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/router_swapper.go:29
# 0x7fe8bc4bc67d net/http.serverHandler.ServeHTTP+0xbd /usr/local/go/src/net/http/server.go:2694
# 0x7fe8bc4b88a2 net/http.(*conn).serve+0x652 /usr/local/go/src/net/http/server.go:1830

1 @ 0x7fe8bc202980 0x7fe8bc202a40 0x7fe8bc2135d8 0x7fe8bc2131fb 0x7fe8bc239a3b 0x7fe8bcbb679d 0x7fe8bcc26774 0x7fe8bd570b20 0x7fe8bd56f81c 0x7fe8bd56f6bd 0x7fe8bcea8719 0x7fe8bcea938b 0x7fe8bcb726ca 0x7fe8bcb72b01 0x7fe8bc71c26b 0x7fe8bcb85f4a 0x7fe8bc4b9896 0x7fe8bc72a438 0x7fe8bcb849e2 0x7fe8bc4bc67e 0x7fe8bc4b88a3 0x7fe8bc230711
# 0x7fe8bc2131fa sync.runtime_Semacquire+0x3a /usr/local/go/src/runtime/sema.go:56
# 0x7fe8bc239a3a sync.(*RWMutex).RLock+0x4a /usr/local/go/src/sync/rwmutex.go:50
# 0x7fe8bcbb679c github.com/docker/docker/daemon/exec.(*Store).List+0x4c /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/daemon/exec/exec.go:140
# 0x7fe8bcc26773 github.com/docker/docker/container.(*Container).GetExecIDs+0x33 /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/container/container.go:423
# 0x7fe8bd570b1f github.com/docker/docker/daemon.(*Daemon).getInspectData+0x5cf /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/daemon/inspect.go:178
# 0x7fe8bd56f81b github.com/docker/docker/daemon.(*Daemon).ContainerInspectCurrent+0xab /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/daemon/inspect.go:42
# 0x7fe8bd56f6bc github.com/docker/docker/daemon.(*Daemon).ContainerInspect+0x11c /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/daemon/inspect.go:29
# 0x7fe8bcea8718 github.com/docker/docker/api/server/router/container.(*containerRouter).getContainersByName+0x118 /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/router/container/inspect.go:15
# 0x7fe8bcea938a github.com/docker/docker/api/server/router/container.(*containerRouter).(github.com/docker/docker/api/server/router/container.getContainersByName)-fm+0x6a /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/router/container/container.go:39
# 0x7fe8bcb726c9 github.com/docker/docker/api/server/middleware.ExperimentalMiddleware.WrapHandler.func1+0xd9 /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/middleware/experimental.go:26
# 0x7fe8bcb72b00 github.com/docker/docker/api/server/middleware.VersionMiddleware.WrapHandler.func1+0x400 /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/middleware/version.go:62
# 0x7fe8bc71c26a github.com/docker/docker/pkg/authorization.(*Middleware).WrapHandler.func1+0x7aa /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/pkg/authorization/middleware.go:59
# 0x7fe8bcb85f49 github.com/docker/docker/api/server.(*Server).makeHTTPHandler.func1+0x199 /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/server.go:141
# 0x7fe8bc4b9895 net/http.HandlerFunc.ServeHTTP+0x45 /usr/local/go/src/net/http/server.go:1947
# 0x7fe8bc72a437 github.com/docker/docker/vendor/github.com/gorilla/mux.(*Router).ServeHTTP+0x227 /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/github.com/gorilla/mux/mux.go:103
# 0x7fe8bcb849e1 github.com/docker/docker/api/server.(*routerSwapper).ServeHTTP+0x71 /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/router_swapper.go:29
# 0x7fe8bc4bc67d net/http.serverHandler.ServeHTTP+0xbd /usr/local/go/src/net/http/server.go:2694
# 0x7fe8bc4b88a2 net/http.(*conn).serve+0x652 /usr/local/go/src/net/http/server.go:1830

1 @ 0x7fe8bc202980 0x7fe8bc212946 0x7fe8bc8b6881 0x7fe8bc8b699d 0x7fe8bc8e259b 0x7fe8bc8e1695 0x7fe8bc8c47d5 0x7fe8bd2e0c06 0x7fe8bd2eda96 0x7fe8bc8c42fb 0x7fe8bc8c4613 0x7fe8bd2a6474 0x7fe8bd2e6976 0x7fe8bd3661c5 0x7fe8bd56842f 0x7fe8bcea7bdb 0x7fe8bcea9f6b 0x7fe8bcb726ca 0x7fe8bcb72b01 0x7fe8bc71c26b 0x7fe8bcb85f4a 0x7fe8bc4b9896 0x7fe8bc72a438 0x7fe8bcb849e2 0x7fe8bc4bc67e 0x7fe8bc4b88a3 0x7fe8bc230711
# 0x7fe8bc8b6880 github.com/docker/docker/vendor/google.golang.org/grpc/transport.(*Stream).waitOnHeader+0x100 /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/transport/transport.go:222
# 0x7fe8bc8b699c github.com/docker/docker/vendor/google.golang.org/grpc/transport.(*Stream).RecvCompress+0x2c /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/transport/transport.go:233
# 0x7fe8bc8e259a github.com/docker/docker/vendor/google.golang.org/grpc.(*csAttempt).recvMsg+0x63a /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/stream.go:515
# 0x7fe8bc8e1694 github.com/docker/docker/vendor/google.golang.org/grpc.(*clientStream).RecvMsg+0x44 /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/stream.go:395
# 0x7fe8bc8c47d4 github.com/docker/docker/vendor/google.golang.org/grpc.invoke+0x184 /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/call.go:83
# 0x7fe8bd2e0c05 github.com/docker/docker/vendor/github.com/containerd/containerd.namespaceInterceptor.unary+0xf5 /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/github.com/containerd/containerd/grpc.go:35
# 0x7fe8bd2eda95 github.com/docker/docker/vendor/github.com/containerd/containerd.(namespaceInterceptor).(github.com/docker/docker/vendor/github.com/containerd/containerd.unary)-fm+0xf5 /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/github.com/containerd/containerd/grpc.go:51
# 0x7fe8bc8c42fa github.com/docker/docker/vendor/google.golang.org/grpc.(*ClientConn).Invoke+0x10a /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/call.go:35
# 0x7fe8bc8c4612 github.com/docker/docker/vendor/google.golang.org/grpc.Invoke+0xc2 /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/call.go:60
# 0x7fe8bd2a6473 github.com/docker/docker/vendor/github.com/containerd/containerd/api/services/tasks/v1.(*tasksClient).Start+0xd3 /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/github.com/containerd/containerd/api/services/tasks/v1/tasks.pb.go:421
# 0x7fe8bd2e6975 github.com/docker/docker/vendor/github.com/containerd/containerd.(*process).Start+0xf5 /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/github.com/containerd/containerd/process.go:109
# 0x7fe8bd3661c4 github.com/docker/docker/libcontainerd.(*client).Exec+0x4b4 /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/libcontainerd/client_daemon.go:381
# 0x7fe8bd56842e github.com/docker/docker/daemon.(*Daemon).ContainerExecStart+0xb4e /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/daemon/exec.go:251
# 0x7fe8bcea7bda github.com/docker/docker/api/server/router/container.(*containerRouter).postContainerExecStart+0x34a /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/router/container/exec.go:125
# 0x7fe8bcea9f6a github.com/docker/docker/api/server/router/container.(*containerRouter).(github.com/docker/docker/api/server/router/container.postContainerExecStart)-fm+0x6a /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/router/container/container.go:59
# 0x7fe8bcb726c9 github.com/docker/docker/api/server/middleware.ExperimentalMiddleware.WrapHandler.func1+0xd9 /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/middleware/experimental.go:26
# 0x7fe8bcb72b00 github.com/docker/docker/api/server/middleware.VersionMiddleware.WrapHandler.func1+0x400 /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/middleware/version.go:62
# 0x7fe8bc71c26a github.com/docker/docker/pkg/authorization.(*Middleware).WrapHandler.func1+0x7aa /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/pkg/authorization/middleware.go:59
# 0x7fe8bcb85f49 github.com/docker/docker/api/server.(*Server).makeHTTPHandler.func1+0x199 /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/server.go:141
# 0x7fe8bc4b9895 net/http.HandlerFunc.ServeHTTP+0x45 /usr/local/go/src/net/http/server.go:1947
# 0x7fe8bc72a437 github.com/docker/docker/vendor/github.com/gorilla/mux.(*Router).ServeHTTP+0x227 /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/github.com/gorilla/mux/mux.go:103
# 0x7fe8bcb849e1 github.com/docker/docker/api/server.(*routerSwapper).ServeHTTP+0x71 /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/router_swapper.go:29
# 0x7fe8bc4bc67d net/http.serverHandler.ServeHTTP+0xbd /usr/local/go/src/net/http/server.go:2694
# 0x7fe8bc4b88a2 net/http.(*conn).serve+0x652 /usr/local/go/src/net/http/server.go:1830

注意,这是一份精简后的docker协程栈信息。从上面的蓝图,我们可以总结出如下结论:

  • 有 717594 个协程被阻塞在docker inspect
  • 有 4175 个协程被阻塞在docker stats
  • 有 1 个协程被阻塞在获取 docker exec的任务ID
  • 有 1 个协程被阻塞在docker exec的执行过程

从上面的结论,我们基本了解了异常容器hang死的原因:在于该容器执行docker exec (4)后未返回,进而导致获取docker exec的任务ID (3)阻塞,由于(3)获取了容器锁,进而导致了docker inspect (1)与docker stats (2)卡死。所以病因并非是docker inspect,而是docker exec。

要想继续往下挖掘,我们现在有必要补充一下背景知识。kubelet启动容器或者在容器内执行命令的完整调用路径如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
+--------------------------------------------------------------+
| |
| +------------+ |
| | | |
| | kubelet | |
| | | |
| +------|-----+ |
| | |
| | |
| +------v-----+ +---------------+ |
| | | | | |
| | dockerd ------->| containerd | |
| | | | | |
| +------------+ +-------|-------+ |
| | |
| | |
| +-------v-------+ +-----------+ |
| | | | | |
| |containerd-shim----->| runc | |
| | | | | |
| +---------------+ +-----------+ |
| |
+--------------------------------------------------------------+

dockerd与containerd可以当做两层nginx代理,containerd-shim是容器的监护人,而runc则是容器启动与命令执行的真正工具人。runc干的事情也非常简单:按照用户指定的配置创建NS,或者进入特定NS,然后执行用户命令。说白了,创建容器就是新建NS,然后在该NS内执行用户指定的命令。

按照上面介绍的背景知识,我们继续往下探索containerd。幸运的是,借助pprof,我们也可以描绘出containerd此时的运行蓝图:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
goroutine profile: total 430
1 @ 0x7f6e55f82740 0x7f6e55f92616 0x7f6e56a8412c 0x7f6e56a83d6d 0x7f6e56a911bf 0x7f6e56ac6e3b 0x7f6e565093de 0x7f6e5650dd3b 0x7f6e5650392b 0x7f6e56b51216 0x7f6e564e5909 0x7f6e563ec76a 0x7f6e563f000a 0x7f6e563f6791 0x7f6e55fb0151
# 0x7f6e56a8412b github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc.(*Client).dispatch+0x24b /go/src/github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc/client.go:102
# 0x7f6e56a83d6c github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc.(*Client).Call+0x15c /go/src/github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc/client.go:73
# 0x7f6e56a911be github.com/containerd/containerd/linux/shim/v1.(*shimClient).Start+0xbe /go/src/github.com/containerd/containerd/linux/shim/v1/shim.pb.go:1745
# 0x7f6e56ac6e3a github.com/containerd/containerd/linux.(*Process).Start+0x8a /go/src/github.com/containerd/containerd/linux/process.go:125
# 0x7f6e565093dd github.com/containerd/containerd/services/tasks.(*local).Start+0x14d /go/src/github.com/containerd/containerd/services/tasks/local.go:187
# 0x7f6e5650dd3a github.com/containerd/containerd/services/tasks.(*service).Start+0x6a /go/src/github.com/containerd/containerd/services/tasks/service.go:72
# 0x7f6e5650392a github.com/containerd/containerd/api/services/tasks/v1._Tasks_Start_Handler.func1+0x8a /go/src/github.com/containerd/containerd/api/services/tasks/v1/tasks.pb.go:624
# 0x7f6e56b51215 github.com/containerd/containerd/vendor/github.com/grpc-ecosystem/go-grpc-prometheus.UnaryServerInterceptor+0xa5 /go/src/github.com/containerd/containerd/vendor/github.com/grpc-ecosystem/go-grpc-prometheus/server.go:29
# 0x7f6e564e5908 github.com/containerd/containerd/api/services/tasks/v1._Tasks_Start_Handler+0x168 /go/src/github.com/containerd/containerd/api/services/tasks/v1/tasks.pb.go:626
# 0x7f6e563ec769 github.com/containerd/containerd/vendor/google.golang.org/grpc.(*Server).processUnaryRPC+0x849 /go/src/github.com/containerd/containerd/vendor/google.golang.org/grpc/server.go:920
# 0x7f6e563f0009 github.com/containerd/containerd/vendor/google.golang.org/grpc.(*Server).handleStream+0x1319 /go/src/github.com/containerd/containerd/vendor/google.golang.org/grpc/server.go:1142
# 0x7f6e563f6790 github.com/containerd/containerd/vendor/google.golang.org/grpc.(*Server).serveStreams.func1.1+0xa0 /go/src/github.com/containerd/containerd/vendor/google.golang.org/grpc/server.go:637

同样,我们仅保留了关键的协程信息,从上面的协程栈可以看出,containerd阻塞在接收exec返回结果处,附上关键代码佐证:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
func (c *Client) dispatch(ctx context.Context, req *Request, resp *Response) error {
errs := make(chan error, 1)
call := &callRequest{
req: req,
resp: resp,
errs: errs,
}

select {
case c.calls <- call:
case <-c.done:
return c.err
}

select { // 此处对应上面协程栈 /go/src/github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc/client.go:102
case err := <-errs:
return filterCloseErr(err)
case <-c.done:
return c.err
}
}

containerd将请求传递至containerd-shim之后,一直在等待containerd-shim的返回。

正常情况下,如果我们能够按照调用链路逐个分析每个组件的协程调用栈信息,我们能够很快的定位问题所在。不幸的是,由于线上docker没有开启debug模式,我们无法收集containerd-shim的pprof信息,并且runc也没有开启pprof。因此单纯依赖协程调用链路定位问题这条路被堵死了。

截至目前,我们已经收集了部分关键信息,同时也将问题排查范围更进一步地缩小在containerd-shim与runc之间。接下来我们换一种思路继续排查。

进程排查

当组件的运行状态无法继续获取时,我们转换一下思维,获取容器的运行状态,也即异常容器此时的进程状态。

既然docker ps执行正常,而docker inspect hang死,首先我们定位异常容器,命令如下:

1
docker ps | grep -v NAME | awk '{print $1}' | while read cid; do echo $cid; docker inspect -f {{.State.Pid}} $cid; done

拿到异常容器的ID之后,我们就能扫描与该容器相关的所有进程:

1
2
3
4
5
UID      PID    PPID  C STIME TTY          TIME CMD
root 11646 6655 0 Jun17 ? 00:01:04 docker-containerd-shim -namespace moby -workdir /home/docker_rt/containerd/daemon/io.containerd.runtime.v1.linux/moby/bbd5e4b5f9c13666dd0ec7ff7afb2c4c2b0ede40a4adf1de43cc31c606f283f5 -address /var/run/docker/containerd/docker-containerd.sock -containerd-binary /usr/bin/docker-containerd -runtime-root /var/run/docker/runtime-runc
root 11680 11646 0 Jun17 ? 00:00:00 /dockerinit
root 15581 11646 0 Jun17 ? 00:00:00 docker-runc --root /var/run/docker/runtime-runc/moby --log /run/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/bbd5e4b5f9c13666dd0ec7ff7afb2c4c2b0ede40a4adf1de43cc31c606f283f5/log.json --log-format json exec --process /tmp/runc-process616674997 --detach --pid-file /run/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/bbd5e4b5f9c13666dd0ec7ff7afb2c4c2b0ede40a4adf1de43cc31c606f283f5/0594c5897a41d401e4d1d7ddd44dacdd316c7e7d53bfdae7f16b0f6b26fcbcda.pid bbd5e4b5f9c13666dd0ec7ff7afb2c4c2b0ede40a4adf1de43cc31c606f283f5
root 15638 15581 0 Jun17 ? 00:00:00 docker-runc init

核心进程列表如上,简单备注下:

  • 6655:containerd进程
  • 11646:异常容器的containerd-shim进程
  • 11680:异常容器的容器启动进程。在容器内查看,因PID NS的隔离,该进程ID是1
  • 15581:在异常容器内执行用户命令的进程,此时还未进入容器内部
  • 15638:在异常容器内执行用户命令时,进入容器NS的进程

这里再补充一个背景知识:当我们启动容器时,首先会创建runc init进程,创建并进入新的容器NS;而当我们在容器内执行命令时,首先也会创建runc init进程,进入容器的NS。只有在进入容器的隔离NS之后,才会执行用户指定的命令。

面对上面的进程列表,我们无法直观地分辨问题究竟由哪个进程引起。因此,我们还需要了解进程当前所处的状态。借助strace,我们逐一展示进程的活动状态:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
// 11646 (container-shim)
Process 11646 attached with 10 threads
[pid 37342] epoll_pwait(5, <unfinished ...>
[pid 11656] futex(0x818cc0, FUTEX_WAIT, 0, NULL <unfinished ...>
[pid 11655] restart_syscall(<... resuming interrupted call ...> <unfinished ...>
[pid 11654] futex(0x818bd8, FUTEX_WAIT, 0, NULL <unfinished ...>
[pid 11653] futex(0x7fc730, FUTEX_WAKE, 1 <unfinished ...>
[pid 11651] futex(0xc4200b4148, FUTEX_WAIT, 0, NULL <unfinished ...>
[pid 11650] futex(0xc420082948, FUTEX_WAIT, 0, NULL <unfinished ...>
[pid 11649] futex(0xc420082548, FUTEX_WAIT, 0, NULL <unfinished ...>
[pid 11647] restart_syscall(<... resuming interrupted call ...> <unfinished ...>
[pid 11646] futex(0x7fd008, FUTEX_WAIT, 0, NULL <unfinished ...>
[pid 11653] <... futex resumed> ) = 0
[pid 11647] <... restart_syscall resumed> ) = -1 EAGAIN (Resource temporarily unavailable)
[pid 11653] epoll_wait(4, <unfinished ...>
[pid 11647] pselect6(0, NULL, NULL, NULL, {0, 20000}, 0) = 0 (Timeout)
[pid 11647] futex(0x7fc730, FUTEX_WAIT, 0, {60, 0}


// 11581 (runc exec)
Process 15581 attached with 7 threads
[pid 15619] read(6, <unfinished ...>
[pid 15592] futex(0xc4200be148, FUTEX_WAIT, 0, NULL <unfinished ...>
[pid 15591] futex(0x7fd6d25f6238, FUTEX_WAIT, 0, NULL <unfinished ...>
[pid 15590] futex(0xc420084d48, FUTEX_WAIT, 0, NULL <unfinished ...>
[pid 15586] futex(0x7fd6d25f6320, FUTEX_WAIT, 0, NULL <unfinished ...>
[pid 15584] restart_syscall(<... resuming interrupted call ...> <unfinished ...>
[pid 15581] futex(0x7fd6d25d9b28, FUTEX_WAIT, 0, NULL


// 11638 (runc init)
Process 15638 attached with 7 threads
[pid 15648] futex(0x7f512cea5320, FUTEX_WAIT, 0, NULL <unfinished ...>
[pid 15647] futex(0x7f512cea5238, FUTEX_WAIT, 0, NULL <unfinished ...>
[pid 15645] futex(0xc4200bc148, FUTEX_WAIT, 0, NULL <unfinished ...>
[pid 15643] futex(0xc420082d48, FUTEX_WAIT, 0, NULL <unfinished ...>
[pid 15642] futex(0xc420082948, FUTEX_WAIT, 0, NULL <unfinished ...>
[pid 15639] restart_syscall(<... resuming interrupted call ...> <unfinished ...>
[pid 15638] write(2, "/usr/local/go/src/runtime/proc.g"..., 33

从关联进程的活动状态,我们可以得出如下结论:

  • runc exec在等待从6号FD读取数据
  • runc init在等待从2号FD写入数据

这些FD究竟对应的是什么文件呢?我们借助lsof可以查看:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
// 11638 (runc init)
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
runc:[2:I 15638 root cwd DIR 0,41 192 1066743071 /
runc:[2:I 15638 root rtd DIR 0,41 192 1066743071 /
runc:[2:I 15638 root txt REG 0,4 7644224 1070360467 /memfd:runc_cloned:/proc/self/exe (deleted)
runc:[2:I 15638 root mem REG 8,3 2107816 1053962 /usr/lib64/libc-2.17.so
runc:[2:I 15638 root mem REG 8,3 19512 1054285 /usr/lib64/libdl-2.17.so
runc:[2:I 15638 root mem REG 8,3 266688 1050626 /usr/lib64/libseccomp.so.2.3.1
runc:[2:I 15638 root mem REG 8,3 142296 1055698 /usr/lib64/libpthread-2.17.so
runc:[2:I 15638 root mem REG 8,3 27168 3024893 /usr/local/gundam/gundam_client/preload/lib64/gundam_preload.so
runc:[2:I 15638 root mem REG 8,3 164432 1054515 /usr/lib64/ld-2.17.so
runc:[2:I 15638 root 0r FIFO 0,8 0t0 1070361745 pipe
runc:[2:I 15638 root 1w FIFO 0,8 0t0 1070361746 pipe
runc:[2:I 15638 root 2w FIFO 0,8 0t0 1070361747 pipe
runc:[2:I 15638 root 3u unix 0xffff881ff8273000 0t0 1070361341 socket
runc:[2:I 15638 root 5u a_inode 0,9 0 7180 [eventpoll]


// 11581 (runc exec)
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
docker-ru 15581 root cwd DIR 0,18 120 1066743076 /run/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/bbd5e4b5f9c13666dd0ec7ff7afb2c4c2b0ede40a4adf1de43cc31c606f283f5
docker-ru 15581 root rtd DIR 8,3 4096 2 /
docker-ru 15581 root txt REG 8,3 7644224 919775 /usr/bin/docker-runc
docker-ru 15581 root mem REG 8,3 2107816 1053962 /usr/lib64/libc-2.17.so
docker-ru 15581 root mem REG 8,3 19512 1054285 /usr/lib64/libdl-2.17.so
docker-ru 15581 root mem REG 8,3 266688 1050626 /usr/lib64/libseccomp.so.2.3.1
docker-ru 15581 root mem REG 8,3 142296 1055698 /usr/lib64/libpthread-2.17.so
docker-ru 15581 root mem REG 8,3 27168 3024893 /usr/local/gundam/gundam_client/preload/lib64/gundam_preload.so
docker-ru 15581 root mem REG 8,3 164432 1054515 /usr/lib64/ld-2.17.so
docker-ru 15581 root 0r FIFO 0,8 0t0 1070361745 pipe
docker-ru 15581 root 1w FIFO 0,8 0t0 1070361746 pipe
docker-ru 15581 root 2w FIFO 0,8 0t0 1070361747 pipe
docker-ru 15581 root 3w REG 0,18 5456 1066709902 /run/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/bbd5e4b5f9c13666dd0ec7ff7afb2c4c2b0ede40a4adf1de43cc31c606f283f5/log.json
docker-ru 15581 root 4u a_inode 0,9 0 7180 [eventpoll]
docker-ru 15581 root 6u unix 0xffff881ff8275400 0t0 1070361342 socket


// 11646 (container-shim)
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
docker-co 11646 root cwd DIR 0,18 120 1066743076 /run/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/bbd5e4b5f9c13666dd0ec7ff7afb2c4c2b0ede40a4adf1de43cc31c606f283f5
docker-co 11646 root rtd DIR 8,3 4096 2 /
docker-co 11646 root txt REG 8,3 4173632 919772 /usr/bin/docker-containerd-shim
docker-co 11646 root 0r CHR 1,3 0t0 2052 /dev/null
docker-co 11646 root 1w CHR 1,3 0t0 2052 /dev/null
docker-co 11646 root 2w CHR 1,3 0t0 2052 /dev/null
docker-co 11646 root 3r FIFO 0,8 0t0 1070361745 pipe
docker-co 11646 root 4u a_inode 0,9 0 7180 [eventpoll]
docker-co 11646 root 5u a_inode 0,9 0 7180 [eventpoll]
docker-co 11646 root 6u unix 0xffff881e8cac2800 0t0 1066743079 @/containerd-shim/moby/bbd5e4b5f9c13666dd0ec7ff7afb2c4c2b0ede40a4adf1de43cc31c606f283f5/shim.sock
docker-co 11646 root 7u unix 0xffff881e8cac3400 0t0 1066743968 @/containerd-shim/moby/bbd5e4b5f9c13666dd0ec7ff7afb2c4c2b0ede40a4adf1de43cc31c606f283f5/shim.sock
docker-co 11646 root 8r FIFO 0,8 0t0 1066743970 pipe
docker-co 11646 root 9w FIFO 0,8 0t0 1070361745 pipe
docker-co 11646 root 10r FIFO 0,8 0t0 1066743971 pipe
docker-co 11646 root 11u FIFO 0,18 0t0 1066700778 /run/docker/containerd/bbd5e4b5f9c13666dd0ec7ff7afb2c4c2b0ede40a4adf1de43cc31c606f283f5/init-stdout
docker-co 11646 root 12r FIFO 0,8 0t0 1066743972 pipe
docker-co 11646 root 13w FIFO 0,18 0t0 1066700778 /run/docker/containerd/bbd5e4b5f9c13666dd0ec7ff7afb2c4c2b0ede40a4adf1de43cc31c606f283f5/init-stdout
docker-co 11646 root 14u FIFO 0,18 0t0 1066700778 /run/docker/containerd/bbd5e4b5f9c13666dd0ec7ff7afb2c4c2b0ede40a4adf1de43cc31c606f283f5/init-stdout
docker-co 11646 root 15r FIFO 0,18 0t0 1066700778 /run/docker/containerd/bbd5e4b5f9c13666dd0ec7ff7afb2c4c2b0ede40a4adf1de43cc31c606f283f5/init-stdout
docker-co 11646 root 16u FIFO 0,18 0t0 1066700779 /run/docker/containerd/bbd5e4b5f9c13666dd0ec7ff7afb2c4c2b0ede40a4adf1de43cc31c606f283f5/init-stderr
docker-co 11646 root 17w FIFO 0,18 0t0 1066700779 /run/docker/containerd/bbd5e4b5f9c13666dd0ec7ff7afb2c4c2b0ede40a4adf1de43cc31c606f283f5/init-stderr
docker-co 11646 root 18u FIFO 0,18 0t0 1066700779 /run/docker/containerd/bbd5e4b5f9c13666dd0ec7ff7afb2c4c2b0ede40a4adf1de43cc31c606f283f5/init-stderr
docker-co 11646 root 19r FIFO 0,18 0t0 1066700779 /run/docker/containerd/bbd5e4b5f9c13666dd0ec7ff7afb2c4c2b0ede40a4adf1de43cc31c606f283f5/init-stderr
docker-co 11646 root 20r FIFO 0,8 0t0 1070361746 pipe
docker-co 11646 root 26r FIFO 0,8 0t0 1070361747 pipe

有心人结合strace与lsof的结果,已经能够自己得出结论。

runc init往2号FD内写数据时阻塞,2号FD对应的类型是pipe类型。而linux pipe有一个默认的数据大小,当写入的数据超过该大小时,同时读端并未读取数据,写端就会被阻塞。

小结一下:containerd-shim启动runc exec去容器内执行用户命令,而runc exec启动runc init进入容器时,由于往2号FD写数据超过限制大小而被阻塞。当最底层的runc init被阻塞时,造成了调用链路上所有进程都被阻塞:

1
runc init → runc exec → containerd-shim exec → containerd exec → dockerd exec

问题定位至此,我们已经了解了docker hang死的原因。但是,现在我们还有如下问题并未解决:

  • 为什么runc init会往2号FD (对应go语言的os.Stderr) 中写入超过linux pipe大小限制的数据?
  • 为什么runc init出现问题只发生在特定容器?

如果常态下runc init就需要往os.Stdout或者os.Stderr中写入很多数据,那么所有容器的创建都应该有问题。所以,我们可以确定是该异常容器出现了什么未知原因,导致runc init非预期往os.Stderr写入了大量数据。而此时runc init往os.Stderr中写入的数据就很有可能揭示非预期的异常。

所以,我们需要获取runc init当前正在写入的数据。由于runc init的2号FD是个匿名pipe,我们无法使用常规文件读取的方式获取pipe内的数据。这里感谢鹤哥趟坑,找到了一种读取匿名pipe内容的方法:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
# cat /proc/15638/fd/2
runtime/cgo: pthread_create failed: Resource temporarily unavailable
SIGABRT: abort
PC=0x7f512b7365f7 m=0 sigcode=18446744073709551610

goroutine 0 [idle]:
runtime: unknown pc 0x7f512b7365f7
stack: frame={sp:0x7ffe1121a658, fp:0x0} stack=[0x7ffe0ae1bb28,0x7ffe1121ab50)
00007ffe1121a558: 00007ffe1121a6d8 00007ffe1121a6b0
00007ffe1121a568: 0000000000000001 00007f512c527660
00007ffe1121a578: 00007f512c54d560 00007f512c54d208
00007ffe1121a588: 00007f512c333e6f 0000000000000000
00007ffe1121a598: 00007f512c527660 0000000000000005
00007ffe1121a5a8: 0000000000000000 0000000000000001
00007ffe1121a5b8: 00007f512c54d208 00007f512c528000
00007ffe1121a5c8: 00007ffe1121a600 00007f512b704b0c
00007ffe1121a5d8: 00007f512b7110c0 0000000000000000
00007ffe1121a5e8: 00007f512c54d560 00007ffe1121a620
00007ffe1121a5f8: 00007ffe1121a610 000000000f11ed7d
00007ffe1121a608: 00007f512c550153 00000000ffffffff
00007ffe1121a618: 00007f512c550a9b 00007f512b707d00
00007ffe1121a628: 00007f512babc868 00007f512c9e9e5e
00007ffe1121a638: 00007f512d3bb080 00000000000000f1
00007ffe1121a648: 0000000000000011 0000000000000000
00007ffe1121a658: <00007f512b737ce8 0000000000000020
00007ffe1121a668: 0000000000000000 0000000000000000
00007ffe1121a678: 0000000000000000 0000000000000000
00007ffe1121a688: 0000000000000000 0000000000000000
00007ffe1121a698: 0000000000000000 0000000000000000
00007ffe1121a6a8: 0000000000000000 0000000000000000
00007ffe1121a6b8: 0000000000000000 0000000000000000
00007ffe1121a6c8: 0000000000000000 0000000000000000
00007ffe1121a6d8: 0000000000000000 00007f512babc868
00007ffe1121a6e8: 00007f512c9e9e5e 00007f512d3bb080
00007ffe1121a6f8: 00007f512c33f260 00007f512babc1c0
00007ffe1121a708: 00007f512babc1c0 0000000000000001
00007ffe1121a718: 00007f512babc243 00000000000000f1
00007ffe1121a728: 00007f512b7787ec 0000000000000001
00007ffe1121a738: 00007f512babc1c0 000000000000000a
00007ffe1121a748: 00007f512b7e8a4d 000000000000000a
runtime: unknown pc 0x7f512b7365f7
stack: frame={sp:0x7ffe1121a658, fp:0x0} stack=[0x7ffe0ae1bb28,0x7ffe1121ab50)
00007ffe1121a558: 00007ffe1121a6d8 00007ffe1121a6b0
00007ffe1121a568: 0000000000000001 00007f512c527660
00007ffe1121a578: 00007f512c54d560 00007f512c54d208
00007ffe1121a588: 00007f512c333e6f 0000000000000000
00007ffe1121a598: 00007f512c527660 0000000000000005
00007ffe1121a5a8: 0000000000000000 0000000000000001
00007ffe1121a5b8: 00007f512c54d208 00007f512c528000
00007ffe1121a5c8: 00007ffe1121a600 00007f512b704b0c
00007ffe1121a5d8: 00007f512b7110c0 0000000000000000
00007ffe1121a5e8: 00007f512c54d560 00007ffe1121a620
00007ffe1121a5f8: 00007ffe1121a610 000000000f11ed7d
00007ffe1121a608: 00007f512c550153 00000000ffffffff
00007ffe1121a618: 00007f512c550a9b 00007f512b707d00
00007ffe1121a628: 00007f512babc868 00007f512c9e9e5e
00007ffe1121a638: 00007f512d3bb080 00000000000000f1
00007ffe1121a648: 0000000000000011 0000000000000000
00007ffe1121a658: <00007f512b737ce8 0000000000000020
00007ffe1121a668: 0000000000000000 0000000000000000
00007ffe1121a678: 0000000000000000 0000000000000000
00007ffe1121a688: 0000000000000000 0000000000000000
00007ffe1121a698: 0000000000000000 0000000000000000
00007ffe1121a6a8: 0000000000000000 0000000000000000
00007ffe1121a6b8: 0000000000000000 0000000000000000
00007ffe1121a6c8: 0000000000000000 0000000000000000
00007ffe1121a6d8: 0000000000000000 00007f512babc868
00007ffe1121a6e8: 00007f512c9e9e5e 00007f512d3bb080
00007ffe1121a6f8: 00007f512c33f260 00007f512babc1c0
00007ffe1121a708: 00007f512babc1c0 0000000000000001
00007ffe1121a718: 00007f512babc243 00000000000000f1
00007ffe1121a728: 00007f512b7787ec 0000000000000001
00007ffe1121a738: 00007f512babc1c0 000000000000000a
00007ffe1121a748: 00007f512b7e8a4d 000000000000000a

goroutine 1 [running, locked to thread]:
runtime.systemstack_switch()
/usr/local/go/src/runtime/asm_amd64.s:363 fp=0xc4200a3ed0 sp=0xc4200a3ec8 pc=0x7f512c7281d0
runtime.startTheWorld()
/usr/local/go/src/runtime/proc.go:978 +0x2f fp=0xc4200a3ee8 sp=0xc4200a3ed0 pc=0x7f512c70221f
runtime.GOMAXPROCS(0x1, 0xc42013d9a0)
/usr/local/go/src/runtime/debug.go:30 +0xa0 fp=0xc4200a3f10 sp=0xc4200a3ee8 pc=0x7f512c6d9810
main.init.0()
/go/src/github.com/opencontainers/runc/init.go:14 +0x61 fp=0xc4200a3f30 sp=0xc4200a3f10 pc=0x7f512c992801
main.init()
<autogenerated>:1 +0x624 fp=0xc4200a3f88 sp=0xc4200a3f30 pc=0x7f512c9a1014
runtime.main()
/usr/local/go/src/runtime/proc.go:186 +0x1d2 fp=0xc4200a3fe0 sp=0xc4200a3f88 pc=0x7f512c6ff962
runtime.goexit()
/usr/local/go/src/runtime/asm_amd64.s:2361 +0x1 fp=0xc4200a3fe8 sp=0xc4200a3fe0 pc=0x7f512c72ad71

goroutine 6 [syscall]:
os/signal.signal_recv(0x0)
/usr/local/go/src/runtime/sigqueue.go:139 +0xa8
os/signal.loop()
/usr/local/go/src/os/signal/signal_unix.go:22 +0x24
created by os/signal.init.0
/usr/local/go/src/os/signal/signal_unix.go:28 +0x43

rax 0x0
rbx 0x7f512babc868
rcx 0xffffffffffffffff
rdx 0x6
rdi 0x271
rsi 0x271
rbp 0x7f512c9e9e5e
rsp 0x7ffe1121a658
r8 0xa
r9 0x7f512c524740
r10 0x8
r11 0x206
r12 0x7f512d3bb080
r13 0xf1
r14 0x11
r15 0x0
rip 0x7f512b7365f7
rflags 0x206
cs 0x33
fs 0x0
gs 0x0
exec failed: container_linux.go:348: starting container process caused "read init-p: connection reset by peer"

额,runc init因资源不足创建线程失败???这种输出显然不是runc的输出,而是go runtime非预期的输出内容。那么资源不足,究竟是什么资源类型资源不足呢?我们在结合 /var/log/message 日志分析:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
Jun 17 03:18:17 host-xx kernel: runc:[2:INIT] invoked oom-killer: gfp_mask=0xd0, order=0, oom_score_adj=997
Jun 17 03:18:17 host-xx kernel: CPU: 14 PID: 12788 Comm: runc:[2:INIT] Tainted: G W OE ------------ T 3.10.0-514.16.1.el7.stable.v1.4.x86_64 #1
Jun 17 03:18:17 host-xx kernel: Hardware name: Inspur SA5212M4/YZMB-00370-107, BIOS 4.1.10 11/14/2016
Jun 17 03:18:17 host-xx kernel: ffff88103841dee0 00000000c4394691 ffff880263e4bcb8 ffffffff8168863d
Jun 17 03:18:17 host-xx kernel: ffff880263e4bd50 ffffffff81683585 ffff88203cc5e300 ffff880ee02b2380
Jun 17 03:18:17 host-xx kernel: 0000000000000001 0000000000000000 0000000000000000 0000000000000046
Jun 17 03:18:17 host-xx kernel: Call Trace:
Jun 17 03:18:17 host-xx kernel: [<ffffffff8168863d>] dump_stack+0x19/0x1b
Jun 17 03:18:17 host-xx kernel: [<ffffffff81683585>] dump_header+0x85/0x27f
Jun 17 03:18:17 host-xx kernel: [<ffffffff81185b06>] ? find_lock_task_mm+0x56/0xc0
Jun 17 03:18:17 host-xx kernel: [<ffffffff81185fbe>] oom_kill_process+0x24e/0x3c0
Jun 17 03:18:17 host-xx kernel: [<ffffffff81093c2e>] ? has_capability_noaudit+0x1e/0x30
Jun 17 03:18:17 host-xx kernel: [<ffffffff811f4d91>] mem_cgroup_oom_synchronize+0x551/0x580
Jun 17 03:18:17 host-xx kernel: [<ffffffff811f41b0>] ? mem_cgroup_charge_common+0xc0/0xc0
Jun 17 03:18:17 host-xx kernel: [<ffffffff81186844>] pagefault_out_of_memory+0x14/0x90
Jun 17 03:18:17 host-xx kernel: [<ffffffff816813fa>] mm_fault_error+0x68/0x12b
Jun 17 03:18:17 host-xx kernel: [<ffffffff81694405>] __do_page_fault+0x395/0x450
Jun 17 03:18:17 host-xx kernel: [<ffffffff816944f5>] do_page_fault+0x35/0x90
Jun 17 03:18:17 host-xx kernel: [<ffffffff81690708>] page_fault+0x28/0x30
Jun 17 03:18:17 host-xx kernel: memory: usage 3145728kB, limit 3145728kB, failcnt 14406932
Jun 17 03:18:17 host-xx kernel: memory+swap: usage 3145728kB, limit 9007199254740988kB, failcnt 0
Jun 17 03:18:17 host-xx kernel: kmem: usage 3143468kB, limit 9007199254740988kB, failcnt 0
Jun 17 03:18:17 host-xx kernel: Memory cgroup stats for /kubepods/burstable/pod6c4333b3-a663-11ea-b39f-6c92bf85beda: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB
Jun 17 03:18:17 host-xx kernel: Memory cgroup stats for /kubepods/burstable/pod6c4333b3-a663-11ea-b39f-6c92bf85beda/b761e05249245695278b3f409d2d6e5c6a5bff6995ff0cf44d03af4aa9764a30: cache:0KB rss:40KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:40KB inactive_file:0KB active_file:0KB unevictable:0KB
Jun 17 03:18:17 host-xx kernel: Memory cgroup stats for /kubepods/burstable/pod6c4333b3-a663-11ea-b39f-6c92bf85beda/1d1750ecc627cc5d60d80c071b2eb4d515ee8880c5b5136883164f08319869b0: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB
Jun 17 03:18:17 host-xx kernel: Memory cgroup stats for /kubepods/burstable/pod6c4333b3-a663-11ea-b39f-6c92bf85beda/bbd5e4b5f9c13666dd0ec7ff7afb2c4c2b0ede40a4adf1de43cc31c606f283f5: cache:0KB rss:2220KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:2140KB inactive_file:0KB active_file:0KB unevictable:0KB
Jun 17 03:18:17 host-xx kernel: Memory cgroup stats for /kubepods/burstable/pod6c4333b3-a663-11ea-b39f-6c92bf85beda/bbd5e4b5f9c13666dd0ec7ff7afb2c4c2b0ede40a4adf1de43cc31c606f283f5/super-agent: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB
Jun 17 03:18:17 host-xx kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name
Jun 17 03:18:17 host-xx kernel: [30598] 0 30598 255 1 4 0 -998 pause
Jun 17 03:18:17 host-xx kernel: [11680] 0 11680 164833 1118 20 0 997 dockerinit
Jun 17 03:18:17 host-xx kernel: [12788] 0 12788 150184 1146 23 0 997 runc:[2:INIT]
Jun 17 03:18:17 host-xx kernel: oom-kill:,cpuset=bbd5e4b5f9c13666dd0ec7ff7afb2c4c2b0ede40a4adf1de43cc31c606f283f5,mems_allowed=0-1,oom_memcg=/kubepods/burstable/pod6c4333b3-a663-11ea-b39f-6c92bf85beda,task_memcg=/kubepods/burstable/pod6c4333b3-a663-11ea-b39f-6c92bf85beda/bbd5e4b5f9c13666dd0ec7ff7afb2c4c2b0ede40a4adf1de43cc31c606f283f5,task=runc:[2:INIT],pid=12800,uid=0
Jun 17 03:18:17 host-xx kernel: Memory cgroup out of memory: Kill process 12800 (runc:[2:INIT]) score 997 or sacrifice child
Jun 17 03:18:17 host-xx kernel: Killed process 12788 (runc:[2:INIT]) total-vm:600736kB, anon-rss:3296kB, file-rss:276kB, shmem-rss:1012kB

/var/log/message 记录了该容器在大约1个月前大量的OOM日志信息,该时间与异常的runc init进程启动时间基本匹配。

小结runc init阻塞的原因:在一个非常关键的时间节点,runc init由于内存资源不足,创建线程失败,触发go runtime的非预期输出,进而造成runc init阻塞在写pipe操作。

定位至此,问题的全貌已经基本描述清楚。但是我们还有一个疑问,既然runc init在往pipe中写数据,难道没有其它进程来读取pipe中的内容吗?

大家还记得上面lsof执行的结果吗?有心人一定发现了该pipe的读端是谁了,对,就是containerd-shim,对应的pipe的inode编号为1070361747。那么,为什么containerd-shim没有来读pipe里面的内容呢?我们结合代码来分析:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
func (e *execProcess) start(ctx context.Context) (err error) {
......
if err := e.parent.runtime.Exec(ctx, e.parent.id, e.spec, opts); err != nil { // 执行runc init
close(e.waitBlock)
return e.parent.runtimeError(err, "OCI runtime exec failed")
}
......
else if !e.stdio.IsNull() {
fifoCtx, cancel := context.WithTimeout(ctx, 15*time.Second)
defer cancel()

if err := copyPipes(fifoCtx, e.io, e.stdio.Stdin, e.stdio.Stdout, e.stdio.Stderr, &e.wg, &copyWaitGroup); err != nil { // 读pipe
return errors.Wrap(err, "failed to start io pipe copy")
}
}
......
}


func (r *Runc) Exec(context context.Context, id string, spec specs.Process, opts *ExecOpts) error {
......
cmd := r.command(context, append(args, id)...)
if opts != nil && opts.IO != nil {
opts.Set(cmd)
}
......
ec, err := Monitor.Start(cmd)
......
status, err := Monitor.Wait(cmd, ec)
......
}

额,containerd-shim的设计是,等待runc init执行完成之后,再来读取pipe中的内容。但是此时的runc init由于非预期的写入数据量比较大,被阻塞在了写pipe操作处。。。完美的死锁。

终于,本次docker hang死问题的核心脉络都已理清。接下来我们再来聊聊解决方案。

解决方案

当了解了docker hang死的成因之后,我们可以针对性的提出如下解决办法。

最直观的办法

既然docker exec可能会引起docker hang死,那么我们禁用系统中所有的docker exec操作即可。最典型的是kubelet的probe,当前我们默认给所有Pod添加了ReadinessProbe,并且是以exec的形式进入容器内执行命令。我们调整kubelet的探测行为,修改为tcp或者http probe即可。

这里组件虽然改动不大,但是涉及业务容器的改造成本太大了,如何迁移存量集群是个大问题。

最根本的办法

既然当前containerd-shim读pipe需要等待runc exec执行完毕,如果我们将读pipe的操作提前至runc exec命令执行之前,理论上也可以避免死锁。

同样。这种方案的升级成本太高了,升级containerd-shim时需要重启存量的所有容器,这个方案基本不可能通过评审。

最简单的办法

既然runc init阻塞在写pipe,我们主动读取pipe内的内容,也能让runc init顺利退出。

在将本解决方案自动化的过程中,如何能够识别如docker hang死是由于写pipe导致的,是一个小小的挑战。但是相对于以上两种解决方案,我认为还是值得一试,毕竟其影响微乎其微。

后续

其实我们在读pipe的时候还引起了一个另外的问题,详见:一次读-pipe-引发的血案

另外,docker hang死的原因远非这一种,本次排查的结果也并非适用于所有场景。希望各位看官能够根据自己的现场排查问题。

本次docker hang死的排查之旅已然告终。

本次排查由四人小分队 @飞哥 @鹤哥 @博哥 @我 一起排查长达数天的结论,欢迎大家一键三连,以表支持。

以上排查如果有误,也欢迎指正。

坚持原创技术分享,您的支持将鼓励我继续创作!