当出现出现cpu占比100%的情况,但看memory占比,并无异常,怀疑是某个地方死循环了。
我们可以使用 top命令查看CPU使用率较高的进程,然后使用top -Hp pid 查看该进程中每个线程的CPU使用情况,进而可以定位。
pstack可用来跟踪进程栈,这个命令在排查进程问题时非常有用,比如我们发现一个服务一直处于work状态(如假死状态,好似死循环),使用这个命令就能轻松定位问题所在;可以在一段时间内,多执行几次pstack,若发现代码栈总是停在同一个位置,那个位置就需要重点关注,很可能就是出问题的地方;
pstack命令须由$pid进程的属主或者root运行。
示例如下:
pstack 4551 Thread 7 (Thread 1084229984 (LWP 4552)): <span class="hljs-comment">#0 0x000000302afc63dc in epoll_wait () from /lib64/tls/libc.so.6</span> <span class="hljs-comment">#1 0x00000000006f0730 in ub::EPollEx::poll ()</span> <span class="hljs-comment">#2 0x00000000006f172a in ub::NetReactor::callback ()</span> <span class="hljs-comment">#3 0x00000000006fbbbb in ub::UBTask::CALLBACK ()</span> <span class="hljs-comment">#4 0x000000302b80610a in start_thread () from /lib64/tls/libpthread.so.0</span> <span class="hljs-comment">#5 0x000000302afc6003 in clone () from /lib64/tls/libc.so.6</span> <span class="hljs-comment">#6 0x0000000000000000 in ?? ()</span> Thread 6 (Thread 1094719840 (LWP 4553)): <span class="hljs-comment">#0 0x000000302afc63dc in epoll_wait () from /lib64/tls/libc.so.6</span> <span class="hljs-comment">#1 0x00000000006f0730 in ub::EPollEx::poll ()</span> <span class="hljs-comment">#2 0x00000000006f172a in ub::NetReactor::callback ()</span> <span class="hljs-comment">#3 0x00000000006fbbbb in ub::UBTask::CALLBACK ()</span> <span class="hljs-comment">#4 0x000000302b80610a in start_thread () from /lib64/tls/libpthread.so.0</span> <span class="hljs-comment">#5 0x000000302afc6003 in clone () from /lib64/tls/libc.so.6</span> <span class="hljs-comment">#6 0x0000000000000000 in ?? ()</span> Thread 5 (Thread 1105209696 (LWP 4554)): <span class="hljs-comment">#0 0x000000302b80baa5 in __nanosleep_nocancel ()</span> <span class="hljs-comment">#1 0x000000000079e758 in comcm::ms_sleep ()</span> <span class="hljs-comment">#2 0x00000000006c8581 in ub::UbClientManager::healthyCheck ()</span> <span class="hljs-comment">#3 0x00000000006c8471 in ub::UbClientManager::start_healthy_check ()</span> <span class="hljs-comment">#4 0x000000302b80610a in start_thread () from /lib64/tls/libpthread.so.0</span> <span class="hljs-comment">#5 0x000000302afc6003 in clone () from /lib64/tls/libc.so.6</span> <span class="hljs-comment">#6 0x0000000000000000 in ?? ()</span> Thread 4 (Thread 1115699552 (LWP 4555)): <span class="hljs-comment">#0 0x000000302b80baa5 in __nanosleep_nocancel ()</span> <span class="hljs-comment">#1 0x0000000000482b0e in armor::armor_check_thread ()</span> <span class="hljs-comment">#2 0x000000302b80610a in start_thread () from /lib64/tls/libpthread.so.0</span> <span class="hljs-comment">#3 0x000000302afc6003 in clone () from /lib64/tls/libc.so.6</span> <span class="hljs-comment">#4 0x0000000000000000 in ?? ()</span> Thread 3 (Thread 1126189408 (LWP 4556)): <span class="hljs-comment">#0 0x000000302af8f1a5 in __nanosleep_nocancel () from /lib64/tls/libc.so.6</span> <span class="hljs-comment">#1 0x000000302af8f010 in sleep () from /lib64/tls/libc.so.6</span> <span class="hljs-comment">#2 0x000000000044c972 in Business_config_manager::run ()</span> <span class="hljs-comment">#3 0x0000000000457b83 in Thread::run_thread ()</span> <span class="hljs-comment">#4 0x000000302b80610a in start_thread () from /lib64/tls/libpthread.so.0</span> <span class="hljs-comment">#5 0x000000302afc6003 in clone () from /lib64/tls/libc.so.6</span> <span class="hljs-comment">#6 0x0000000000000000 in ?? ()</span> Thread 2 (Thread 1136679264 (LWP 4557)): <span class="hljs-comment">#0 0x000000302af8f1a5 in __nanosleep_nocancel () from /lib64/tls/libc.so.6</span> <span class="hljs-comment">#1 0x000000302af8f010 in sleep () from /lib64/tls/libc.so.6</span> <span class="hljs-comment">#2 0x00000000004524bb in Process_thread::sleep_period ()</span> <span class="hljs-comment">#3 0x0000000000452641 in Process_thread::run ()</span> <span class="hljs-comment">#4 0x0000000000457b83 in Thread::run_thread ()</span> <span class="hljs-comment">#5 0x000000302b80610a in start_thread () from /lib64/tls/libpthread.so.0</span> <span class="hljs-comment">#6 0x000000302afc6003 in clone () from /lib64/tls/libc.so.6</span> <span class="hljs-comment">#7 0x0000000000000000 in ?? ()</span> Thread 1 (Thread 182894129792 (LWP 4551)): <span class="hljs-comment">#0 0x000000302af8f1a5 in __nanosleep_nocancel () from /lib64/tls/libc.so.6</span> <span class="hljs-comment">#1 0x000000302af8f010 in sleep () from /lib64/tls/libc.so.6</span> <span class="hljs-comment">#2 0x0000000000420d79 in Ad_preprocess::run ()</span> <span class="hljs-comment">#3 0x0000000000450ad0 in main ()</span>
另外说明strace
strace常用来跟踪进程执行时的系统调用和所接收的信号。 在Linux世界,进程不能直接访问硬件设备,当进程需要访问硬件设备(比如读取磁盘文件,接收网络数据等等)时,必须由用户态模式切换至内核态模式,通过系统调用访问硬件设备。strace可以跟踪到一个进程产生的系统调用,包括参数,返回值,执行消耗的时间。
示例:
</pre> <pre>$strace cat /dev/null execve("/bin/cat", ["cat", "/dev/null"], [/* 22 vars */]) = 0 brk(0) = 0xab1000 access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory) mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f29379a7000 access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory) ...
每一行都是一条系统调用,等号左边是系统调用的函数名及其参数,右边是该调用的返回值。 strace 显示这些调用的参数并返回符号形式的值。strace 从内核接收信息,而且不需要以任何特殊的方式来构建内核。
跟踪可执行程序
strace -f -F -o ~/straceout.txt myserver
-f -F选项告诉strace同时跟踪fork和vfork出来的进程,-o选项把所有strace输出写到~/straceout.txt里 面,myserver是要启动和调试的程序。
跟踪服务程序
strace -o output.txt -T -tt -e trace=all -p 28979
跟踪28979进程的所有系统调用(-e trace=all),并统计系统调用的花费时间,以及开始时间(并以可视化的时分秒格式显示),最后将记录结果存在output.txt文件里面。
 
线程死锁排查:等您坐沙发呢!