线程死锁排查

当出现出现cpu占比100%的情况,但看memory占比,并无异常,怀疑是某个地方死循环了。

我们可以使用 top命令查看CPU使用率较高的进程,然后使用top -Hp pid 查看该进程中每个线程的CPU使用情况,进而可以定位。

pstack可用来跟踪进程栈,这个命令在排查进程问题时非常有用,比如我们发现一个服务一直处于work状态(如假死状态,好似死循环),使用这个命令就能轻松定位问题所在;可以在一段时间内,多执行几次pstack,若发现代码栈总是停在同一个位置,那个位置就需要重点关注,很可能就是出问题的地方;
pstack命令须由$pid进程的属主或者root运行。
示例如下:
pstack 4551
Thread 7 (Thread 1084229984 (LWP 4552)):
<span class="hljs-comment">#0  0x000000302afc63dc in epoll_wait () from /lib64/tls/libc.so.6</span>
<span class="hljs-comment">#1  0x00000000006f0730 in ub::EPollEx::poll ()</span>
<span class="hljs-comment">#2  0x00000000006f172a in ub::NetReactor::callback ()</span>
<span class="hljs-comment">#3  0x00000000006fbbbb in ub::UBTask::CALLBACK ()</span>
<span class="hljs-comment">#4  0x000000302b80610a in start_thread () from /lib64/tls/libpthread.so.0</span>
<span class="hljs-comment">#5  0x000000302afc6003 in clone () from /lib64/tls/libc.so.6</span>
<span class="hljs-comment">#6  0x0000000000000000 in ?? ()</span>
Thread 6 (Thread 1094719840 (LWP 4553)):
<span class="hljs-comment">#0  0x000000302afc63dc in epoll_wait () from /lib64/tls/libc.so.6</span>
<span class="hljs-comment">#1  0x00000000006f0730 in ub::EPollEx::poll ()</span>
<span class="hljs-comment">#2  0x00000000006f172a in ub::NetReactor::callback ()</span>
<span class="hljs-comment">#3  0x00000000006fbbbb in ub::UBTask::CALLBACK ()</span>
<span class="hljs-comment">#4  0x000000302b80610a in start_thread () from /lib64/tls/libpthread.so.0</span>
<span class="hljs-comment">#5  0x000000302afc6003 in clone () from /lib64/tls/libc.so.6</span>
<span class="hljs-comment">#6  0x0000000000000000 in ?? ()</span>
Thread 5 (Thread 1105209696 (LWP 4554)):
<span class="hljs-comment">#0  0x000000302b80baa5 in __nanosleep_nocancel ()</span>
<span class="hljs-comment">#1  0x000000000079e758 in comcm::ms_sleep ()</span>
<span class="hljs-comment">#2  0x00000000006c8581 in ub::UbClientManager::healthyCheck ()</span>
<span class="hljs-comment">#3  0x00000000006c8471 in ub::UbClientManager::start_healthy_check ()</span>
<span class="hljs-comment">#4  0x000000302b80610a in start_thread () from /lib64/tls/libpthread.so.0</span>
<span class="hljs-comment">#5  0x000000302afc6003 in clone () from /lib64/tls/libc.so.6</span>
<span class="hljs-comment">#6  0x0000000000000000 in ?? ()</span>
Thread 4 (Thread 1115699552 (LWP 4555)):
<span class="hljs-comment">#0  0x000000302b80baa5 in __nanosleep_nocancel ()</span>
<span class="hljs-comment">#1  0x0000000000482b0e in armor::armor_check_thread ()</span>
<span class="hljs-comment">#2  0x000000302b80610a in start_thread () from /lib64/tls/libpthread.so.0</span>
<span class="hljs-comment">#3  0x000000302afc6003 in clone () from /lib64/tls/libc.so.6</span>
<span class="hljs-comment">#4  0x0000000000000000 in ?? ()</span>
Thread 3 (Thread 1126189408 (LWP 4556)):
<span class="hljs-comment">#0  0x000000302af8f1a5 in __nanosleep_nocancel () from /lib64/tls/libc.so.6</span>
<span class="hljs-comment">#1  0x000000302af8f010 in sleep () from /lib64/tls/libc.so.6</span>
<span class="hljs-comment">#2  0x000000000044c972 in Business_config_manager::run ()</span>
<span class="hljs-comment">#3  0x0000000000457b83 in Thread::run_thread ()</span>
<span class="hljs-comment">#4  0x000000302b80610a in start_thread () from /lib64/tls/libpthread.so.0</span>
<span class="hljs-comment">#5  0x000000302afc6003 in clone () from /lib64/tls/libc.so.6</span>
<span class="hljs-comment">#6  0x0000000000000000 in ?? ()</span>
Thread 2 (Thread 1136679264 (LWP 4557)):
<span class="hljs-comment">#0  0x000000302af8f1a5 in __nanosleep_nocancel () from /lib64/tls/libc.so.6</span>
<span class="hljs-comment">#1  0x000000302af8f010 in sleep () from /lib64/tls/libc.so.6</span>
<span class="hljs-comment">#2  0x00000000004524bb in Process_thread::sleep_period ()</span>
<span class="hljs-comment">#3  0x0000000000452641 in Process_thread::run ()</span>
<span class="hljs-comment">#4  0x0000000000457b83 in Thread::run_thread ()</span>
<span class="hljs-comment">#5  0x000000302b80610a in start_thread () from /lib64/tls/libpthread.so.0</span>
<span class="hljs-comment">#6  0x000000302afc6003 in clone () from /lib64/tls/libc.so.6</span>
<span class="hljs-comment">#7  0x0000000000000000 in ?? ()</span>
Thread 1 (Thread 182894129792 (LWP 4551)):
<span class="hljs-comment">#0  0x000000302af8f1a5 in __nanosleep_nocancel () from /lib64/tls/libc.so.6</span>
<span class="hljs-comment">#1  0x000000302af8f010 in sleep () from /lib64/tls/libc.so.6</span>
<span class="hljs-comment">#2  0x0000000000420d79 in Ad_preprocess::run ()</span>
<span class="hljs-comment">#3  0x0000000000450ad0 in main ()</span>

另外说明strace

strace常用来跟踪进程执行时的系统调用和所接收的信号。 在Linux世界,进程不能直接访问硬件设备,当进程需要访问硬件设备(比如读取磁盘文件,接收网络数据等等)时,必须由用户态模式切换至内核态模式,通过系统调用访问硬件设备。strace可以跟踪到一个进程产生的系统调用,包括参数,返回值,执行消耗的时间。
示例:

</pre>
<pre>$strace cat /dev/null
execve("/bin/cat", ["cat", "/dev/null"], [/* 22 vars */]) = 0
brk(0)                                  = 0xab1000
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f29379a7000
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
...

每一行都是一条系统调用,等号左边是系统调用的函数名及其参数,右边是该调用的返回值。 strace 显示这些调用的参数并返回符号形式的值。strace 从内核接收信息,而且不需要以任何特殊的方式来构建内核。

跟踪可执行程序

strace -f -F -o ~/straceout.txt myserver

-f -F选项告诉strace同时跟踪fork和vfork出来的进程,-o选项把所有strace输出写到~/straceout.txt里 面,myserver是要启动和调试的程序。

跟踪服务程序

strace -o output.txt -T -tt -e trace=all -p 28979

跟踪28979进程的所有系统调用(-e trace=all),并统计系统调用的花费时间,以及开始时间(并以可视化的时分秒格式显示),最后将记录结果存在output.txt文件里面。

 
喜欢 3
分享