stevepedwards.com/DebianAdmin linux mint IT admin tips info

Flame Graphs by www.brendangregg.com – Intro to System Profiling and Admin Tools

Profiling builds a picture of a target that can be studied and understood...by sampling the system state at timed intervals...provides a coarse view of activity, depending on sampling rate.” Systems Performance, p30, Brendan Gregg

Flame Graphs are a stroke of genius and an advanced topic. I discovered them only today. They give an insight into the inner workings of an OS kernel so prove invaluable for core kernel or user level fault finding or performance profiling for the system itself, or software running on it. It seems a very complex topic to start with (and is for a professional level understanding!), but you can learn a lot in a few hours if you watch Brendan's videos and get through the first steps to download and run his well thought out and script generous examples, that I have completed below from his site - YOU NEED to be ROOT - sudo won't do!:

http://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html#perf

Don't be put off by the seeming tech complexity of the info below - I have done the initial research, hit the hurdles and got over them for you!

perfconflicts.png

If you can get them created and browser view them, then you can learn the tech side of their value by watching Brendan's vid presentations for their overview later. perf is included in linux tools (your kernel version e.g)

(I have dependency problems with kernel -38 on my PCs but not -39 on the laptop).

uname -a
Linux AMD 3.16.0-39-generic

apt-get update

apt-get upgrade

apt-get install linux-tools-common linux-tools-generic linux-tools-3.16.0-39-generic 

apt-get install git

git clone https://github.com/brendangregg/FlameGraph
Cloning into 'FlameGraph'...
remote: Counting objects: 695, done.
remote: Total 695 (delta 0), reused 0 (delta 0), pack-reused 695
Receiving objects: 100% (695/695), 1.09 MiB | 463.00 KiB/s, done.
Resolving deltas: 100% (383/383), done.
Checking connectivity... done.

Now try Brendan's examples:

cd FlameGraph

perf record -F 99 -a -g -- sleep 60

[ perf record: Woken up 5 times to write data ]
[ perf record: Captured and wrote 2.051 MB perf.data (~89589 samples) ]

perf script | ./stackcollapse-perf.pl > out.perf-folded

Failed to open /tmp/perf-14384.map, continuing without symbols...

 ./flamegraph.pl out.perf-folded > perf-kernel.svg

flamerw.png

This active code SVG graphic is a security risk on WordPress so cannot upload, so you have to create your own below or see the demo folder, to view yours in a browser.

grep -v cpu_idle out.perf-folded | ./flamegraph.pl > nonidle.svg

grep ext4 out.perf-folded | ./flamegraph.pl > ext4internals.svg

egrep 'system_call.*sys_(read|write)' out.perf-folded | ./flamegraph.pl > rw.svg

Reading the actual ./stackcollapse-perf.pl script mentions trace problems if you have them:

less ./stackcollapse-perf.pl

"# The output of "perf script" should include stack traces. If these are missing
# for you, try manually selecting the perf script output; eg:
#
# perf script -f comm,pid,tid,cpu,time,event,ip,sym,dso,trace | ..."

Dir contents now is:

Flamegraphdir.png

getfacl FlameGraph/
# file: FlameGraph/
# owner: root
# group: root
user::rwx
group::r-x
other::r-x

Full "perf stat" system access is available only for root as the Mint kernel is compiled as paranoid level 1 in:

 sudo cat /proc/sys/kernel/perf_event_paranoid
1

If you run as a user, you get:

perf stat -B -ecycles:u,instructions:u -a dd if=/dev/zero of=/dev/null count=2000000
Error:
You may not have permission to collect system-wide stats.
Consider tweaking /proc/sys/kernel/perf_event_paranoid:
-1 - Not paranoid at all
0 - Disallow raw tracepoint access for unpriv
1 - Disallow cpu events for unpriv
2 - Disallow kernel profiling for unpriv

So what are Flame Graphs, what do they show and how are they used?

You can see the basis of the idea from the bifurcations in the samples view generated by perf record that first generates data for the perf.data file, which is then read back by perf report here:

perf record -a -g -F 997 sleep 10

perf report --stdio

perfrptbifur.png

There are more videos to check out on YouTube for this topic and others by Brendan; this pair on Performance Tools are important to Admins:

A list of events for possible analysis can be viewed via:

sudo perf list | wc -l
1488

perlist.png

Examples on Brendan's perf page: http://www.brendangregg.com/perf.html

Now you just have to find a PC/Server with a performance issue and apply the knowledge! Easy right!? Hmm...make sure you charge a fortune if you do fix it using these means...you deserve it!

Welcome to the mind bending world of hard core OS tech...It's beyond me mostly, as I'm a shit programmer but no good reason given the video examples, why anyone may not find a given problem with this tool - say a memory leaking program, or one that is too resource heavy from bugs/bad code. Finding the fault is the first step to a fix...

Comments are closed.

Post Navigation