.. highlight:: awk SystemTap ============= 概述 --------- SystempTap可以对运行中的内核和用户空间的进程进行实时的探测跟踪和分析。 工作原理 -------------- SystemTap的工作原理是使用linux提供的kprobe接口,将用户需要观测linux系统的行为转换成为c(使用kprobe)并且编译成为kernel object然后运行。systemtap使用一种触发式的工作方式,通过安装探针(probe)来监控系统的行为,一旦探针事件出现的话就会触发对应的代码。代码使用自己定义的 .. image:: /images/systemtap-prepare-workflow.gif + stap 流程从将脚本转换成解析树开始 (pass 1) + 然后使用细化(elaboration)步骤 (pass 2) 中关于当前运行的内核的符号信息解析符号 + 接下来转换流程将解析树转换成C源代码 (pass 3) 并使用解析后的信息和tapset脚本(SystemTap 定义的库,包含有用的功能) + stap 的最后步骤是构造使用本地内核模块构建进程的内核模块 (pass 4)。 .. image:: /images/systemtap-run-workflow.gif :: $ stap -ve 'probe begin { log("hello world") exit() }' Pass 1: parsed user script and 76 library script(s) using 92476virt/22592res/2616shr kb, in 80usr/0sys/85real ms. Pass 2: analyzed script: 1 probe(s), 2 function(s), 0 embed(s), 0 global(s) using 93000virt/23472res/2816shr kb, in 0usr/0sys/4real ms. Pass 3: translated to C into "/tmp/stapbx8Gpk/stap_7703b9bd08bd359932cf8da12019f6d8_813.c" using 93000virt/23628res/2964shr kb, in 0 usr/0sys/0real ms. Pass 4: compiled C into "stap_7703b9bd08bd359932cf8da12019f6d8_813.ko" in 3240usr/510sys/4048real ms. Pass 5: starting run. hello world Pass 5: run completed in 10usr/10sys/600real ms. 第二次运行的时候systemtap会去读取缓存内容。 :: $ stap -ve 'probe begin { log("hello world") exit() }' Pass 1: parsed user script and 76 library script(s) using 92476virt/22592res/2616shr kb, in 80usr/10sys/85real ms. Pass 2: analyzed script: 1 probe(s), 2 function(s), 0 embed(s), 0 global(s) using 93000virt/23472res/2816shr kb, in 0usr/0sys/4real ms. Pass 3: using cached /home/dirlt/.systemtap/cache/77/stap_7703b9bd08bd359932cf8da12019f6d8_813.c Pass 4: using cached /home/dirlt/.systemtap/cache/77/stap_7703b9bd08bd359932cf8da12019f6d8_813.ko Pass 5: starting run. hello world Pass 5: run completed in 0usr/10sys/586real ms. 安装 -------- **CentOS** + yum install systemtap kernel-devel + 从 http://debuginfo.centos.org/ 下载 *kernel-debuginfo* 和 *kernel-debuginfo-common* 并安装 ``rpm -Uhv kernel-debuginfo-*rpm`` , ``uname -r`` 查看应该下载哪个版本。 参考: http://sourceware.org/systemtap/wiki/SystemTapOnCentOS **Ubuntu** (a little troublesome) 参考: http://posulliv.github.io/2010/02/26/installing-stap/ 基本使用方法 --------------------- 1. 执行 ``stap -L PROBE`` 列出匹配的PROBE及其上下文中可以使用的一些变量。 :: $ stap -L 'syscall.open' syscall.open name:string filename:string flags:long mode:long argstr:string $filename:char const* $flags :int $mode:int 2. 写一个脚本: :: probe syscall.open { // do your custom actions } 3. 执行 `stap /path/to/script` 开始probe。 更多更详细的内容请参考官方文档: https://sourceware.org/systemtap/documentation.html 使用示例 ----------- 1. 谁删了我的文件 :: global pid2cmdline probe syscall.exec* { pid2cmdline[pid()] = argstr } probe kernel.function("sys_exit") { delete pid2cmdline[pid()] } probe syscall.unlink* { nm = user_string($pathname); if (isinstr(nm, "core")) { printf("unlink(at) %s\n", nm) task = task_current() while (task_pid(task) != 1) { printf("%d\t%s\t%s\n", task_pid(task), task_execname(task), pid2cmdline[task_pid(task)]) task = task_parent(task) } printf("------------------------\n") } } 2. 系统级别的strace :: $ stap -e 'probe syscall.* {printf("%s(%d) %s %s\n", execname(), pid(), name, argstr)}' 32657 nginx recvfrom 22, 0x7fffba367eef, 1, MSG_PEEK, 0x0, 0x0 18521 direwolf poll 0x7fff58d0ab60, 1, 4000 6633 top read 8, 0x3cdac118a0, 1023 32657 nginx epoll_wait 20, 0x21cbee0, 512, 2000 ... 3. 谁起了/杀了我的进程 :: probe syscall.exec* { printf("execve %s %s\n", execname(), argstr) } probe signal.send { if (sig_name == "SIGKILL" || sig_name == "SIGTERM") printf("%s was sent to %s (pid:%d) by %s uid :%d\n", sig_name, pid_name , sig_pid, execname(), uid()) } 4. 进程在干什么 :: probe process("/path/to/nginx").function("*") { printf("%s(%s)\n", probefunc(), $$parms) } 使用SystemTap做性能分析 --------------------------- **2 Steps** 1. 对进程的backtrace进行采样统计 2. 使用FlameGraph [#fg]_ 对采样的结果进行可视化 C级别的性能分析 :: global s; global quit = 0; probe timer.profile { if (pid() == target()) { if (quit) { foreach (i in s-) { print_ustack(i); printf("\t%d\n", @count(s[i])); } exit() } else { s[ubacktrace()] <<< 1; } } } probe timer.s(20) { quit = 1 } 脚本运行后生成的结果示例: :: 0x3cda6e86f3 : __epoll_wait_nocancel+0xa/0x67 [/lib64/libc-2.12.so] 0x433f49 : ngx_epoll_process_events+0x3b/0x409 [/usr/local/nginx/sbin/nginx] 0x4260d2 : ngx_process_events_and_timers+0xd6/0x165 [/usr/local/nginx/sbin/nginx] 0x432650 : ngx_worker_process_cycle+0x161/0x285 [/usr/local/nginx/sbin/nginx] 0x42e046 : ngx_spawn_process+0x642/0x991 [/usr/local/nginx/sbin/nginx] 0x431885 : ngx_start_worker_processes+0x93/0x100 [/usr/local/nginx/sbin/nginx] 0x430fd6 : ngx_master_process_cycle+0x282/0x8b8 [/usr/local/nginx/sbin/nginx] 0x40397a : main+0x538/0x53f [/usr/local/nginx/sbin/nginx] 0x3cda61ecdd : __libc_start_main+0xfd/0x1d0 [/lib64/libc-2.12.so] 0x4032a9 : _start+0x29/0x2c [/usr/local/nginx/sbin/nginx] 46 0x3cdaa0e4d0 : __write_nocancel+0x7/0x57 [/lib64/libpthread-2.12.so] 0x44d7f8 : ngx_write_fd+0x28/0x2a [/usr/local/nginx/sbin/nginx] 最后生成火焰图 :: $ stackcollapse-stap.pl a.bt | flamegraph.pl > a.svg .. image:: /images/dtools-flamegraph.svg :width: 800px 如果是对动态语言,比如Python,PHP等高级语言进行性能分析,需要自己去生成对应语言级别的Backtrace,下面是如何获取Python的执行栈的核心代码。 :: probe timer.profile { if (pid() == target()) { _current = @var("_PyThreadState_Current@Python/pystate.c") if (_current) { bt = "" f = @cast(_current, "PyThreadState")->frame while (f != 0) { filename = PyString_As_String(@cast(f, "PyFrameObject")->f_code->co_filename) name = PyString_As_String(@cast(f, "PyFrameObject")->f_code->co_name) lineno = @cast(f, "PyFrameObject")->f_code->co_firstlineno bt .= sprintf("%s:%d %s\n", filename, lineno, name) f = @cast(f, "PyFrameObject")->f_back; } bts[bt] <<< 1 } } } .. [#fg] https://github.com/brendangregg/FlameGraph 参考资料 ------------ + `SystemTap Tapset Reference Manual `_ + `Flame Graphs for Online Performance Profiling `_ + `Playing with ptrace `_ + `Ptrace, Utrace, Uprobes: Lightweight, Dynamic Tracing of User Apps `_ + `玩转utrace `_ + `How Debuggers Work `_ + `C/C++ Debugging/Tracing/Profiling `_ + http://bugs.python.org/issue4111 + `Fedora 13 Spotlight Feature: Exploring New Frontiers of Python Development `_ + http://sourceware.org/systemtap/wiki/PythonMarkers + http://dirlt.com/systemtap.html + `巧用Systemtap注入延迟模拟IO设备抖动 `_