stevepedwards.com/DebianAdmin linux mint IT admin tips info

Gnuplot – Basic Ideas to Capture Live System Data

The last Post looked at log captured data from sar/sadf for Gnuplot, so I want to extend those ideas to live capture using dstat, as it has some useful switches.

Dstat can show a rolling output at a 1 second default with options:

Using tee, it is possible to view the output live AND store it in a file:

stevee@AMDA8 ~ $ dstat -tc | tee dstatlive.txt
----system---- ----total-cpu-usage----
time |usr sys idl wai hiq siq
18-08 16:14:11| 8 1 89 1 0 0
18-08 16:14:12| 18 4 78 0 0 0

stevee@AMDA8 ~ $ cat dstatlive.txt
----system---- ----total-cpu-usage----
time |usr sys idl wai hiq siq
18-08 16:14:11| 8 1 89 1 0 0
18-08 16:14:12| 18 4 78 0 0 0

It would be great if rolling output could be piped straight into Gnuplot without complicated command lines being constructed, and get an auto formatted graph by date and whatever numerical data exists, but in Linux fashion, you have to start simply and build up - the ethos being the use of simpler but effective tools interactively, so means you have command line power and flexibility to suit any circumstance - you just have to research and use imagination to get there - which can be fun and/or frustrating depending if you are in a hurry or not, by how much you know, and make the best of what knowledge you do have.

I experimented with various unnecessarily complex pipes - but learned a lot - such as awk's line cutting (NR>3) option; the buffer flushing requirement to pipe "live" output; and that the dstat columns can have an extra 0 added to the original output when awk'd, seen if run together in two terminals (sys values go from 2 to 20 etc), but finally settled on using:

dstat -tc | awk '{if(NR>3) print $1,$2,$3,$4 fflush()}' | tee dstatlive.txt
18-08 16:24:26| 37 80
18-08 16:24:27| 37 60
18-08 16:24:28| 22 50 

0add.png

Only the time/date and usr data columns are correct above - the sys column has the extra 0, SO beware when piping that the results you get are what they should be! If another $5 is added, then the sys column is corrected and the next column along gets the appended 0 - weird...Anyway...

Now you have a graph of some recent "live" data  - "Why? What is the difference between this and a recent log value?" I hear you ask...Well, it's about learning Linux tools first, but you may need to provide a quick data sample for a boss etc. and not want or know how to get the data easily from which App logs etc. At least knowing a few live data profiling tools, you may be able to knock up a chart to show a specific problem happening "now" compared to the baseline references you - of course! - have already taken for the system you're responsible for...

You can easily amend the gnuplot files from the prior Post to reflect the new date/time format and files in this data:

set title 'dstat utils'
set autoscale
set terminal png large size 1280,960
set xdata time
set xlabel 'Time of Day' font 'Arial,14' offset -2,0
set ylabel '% util' font 'Arial,12'
set timefmt '%d-%m %H:%M:%S'
set xtics rotate
set format x "%d-%m %H:%M:%S"
set output 'dstat.png'
plot 'dstatlive.txt' using 1:3 title 'util %' with linespoints

Collect data for say, 30 secs using awk to remove the title info:

dstat -tc | awk '{if(NR>3) print $1,$2,$3,$4 fflush()}' | tee dstatlive.txt

18-08 17:30:50| 24 60
18-08 17:30:51| 19 30
18-08 17:30:52| 16 30...

Now run gnuplot for the new files - showing column 3 user CPU% usage:

gnuplot dstat.gnu | xdg-open dstat.png

dstat.png

Now - the MAIN thing to realise about live data is the effect of the measurement process on system load itself - but it's a start to research and experiment with different tools of different output formats to see what can be used in user space that may give useful indications in graphical form to illustrate what you may need and learning general Linux behaviour in the process.

Usage: dstat [-afv] [options..] [delay [count]]
Versatile tool for generating system resource statistics

Dstat options:
-c, --cpu enable cpu stats
-C 0,3,total include cpu0, cpu3 and total
-d, --disk enable disk stats
-D total,hda include hda and total
-g, --page enable page stats
-i, --int enable interrupt stats
-I 5,eth2 include int5 and interrupt used by eth2
-l, --load enable load stats
-m, --mem enable memory stats
-n, --net enable network stats
-N eth1,total include eth1 and total
-p, --proc enable process stats
-r, --io enable io stats (I/O requests completed)
-s, --swap enable swap stats
-S swap1,total include swap1 and total
-t, --time enable time/date output
-T, --epoch enable time counter (seconds since epoch)
-y, --sys enable system stats

--aio enable aio stats
--fs, --filesystem enable fs stats
--ipc enable ipc stats
--lock enable lock stats
--raw enable raw stats
--socket enable socket stats
--tcp enable tcp stats
--udp enable udp stats
--unix enable unix stats
--vm enable vm stats

--plugin-name enable plugins by plugin name (see manual)
--list list all available plugins

-a, --all equals -cdngy (default)
-f, --full automatically expand -C, -D, -I, -N and -S lists
-v, --vmstat equals -pmgdsc -D total

--float force float values on screen
--integer force integer values on screen

--bw, --blackonwhite change colors for white background terminal
--nocolor disable colors (implies --noupdate)
--noheaders disable repetitive headers
--noupdate disable intermediate updates
--output file write CSV output to file

delay is the delay in seconds between each update (default: 1)
count is the number of updates to display before exiting (default: unlimited)

A simpler option is for load averages (with times) as the extra 0 added on by awk to the 15 min load avg fraction makes no odds and columns don't have to be selected:

dstat -tl | awk '{if(NR>3) print $0 fflush()}' | tee dstatlive.txt

18-08 18:30:55|0.86 1.05 1.040
18-08 18:30:56|0.86 1.05 1.040
18-08 18:30:57|0.79 1.03 1.040..

gnuplot's conf can be amended to show all 3 lines on one graph with a bit of tinkering...check it works as is with one line first.

dstat.png

In this case, it has taken the 5 min load averages as the 3rd column - the output has merged the 1 min avg with the seconds of time:

18:30:59|0.79

This would have to be replaced by sed using a space for the "|" e.g.

sed 's/|/ /g' dstatlive.txt > dstat.txt

Now either delete dstatlive.xtx and rename, or change the gnu conf file:

plot 'dstat.txt' using 1:3 title 'util' with linespoints

Now re-run the graph:

dstat.png

cat dstat.txt | head
18-08 18:29:59 0.56 1.03 1.040
18-08 18:30:00 0.56 1.03 1.040
18-08 18:30:01 0.56 1.03 1.040
18-08 18:30:02 0.59 1.03 1.040
18-08 18:30:03 0.59 1.03 1.040
18-08 18:30:04 0.59 1.03 1.040
18-08 18:30:05 0.59 1.03 1.040
18-08 18:30:06 0.59 1.03 1.040
18-08 18:30:07 0.63 1.03 1.040

These are now correct.

So how do you show all 3 lines?

plot 'dstat.txt' using 1:3 with linespoints, 'dstat.txt' using 1:4 with linespoints, 'dstat.txt' using 1:5 with linespoints

dstat.png

What does this tell? That the load avg for the last minute has roughly doubled from 0.5 to 1.0 compared to the prior 5 and 15 minutes avg, so overall the system is returning from recent relatively low activity (before the test), below the 5-15 minute norm, to those of the longer period load norms.

Compared to top, the 1,5,15 min load avgs show:

load average: 1.05, 0.96, 0.90

htop shows less:

htop

sar -q is also a good log source for load avgs and gnuplot practise:

sarq.png

That data could be used to see longer time periods but will require some thinking to remove the headers it puts (uname -a and RESTART some way down) - that's practise for your sed/awk line stripping skills..!

sar -q | wc -l
71

For the sake of plotting the data I'll remove the lines containing text using Notepad, leaving just data lines, the first of is:

00:05:01            0       475      0.14      0.27      0.34         0

Columns 1,4,5,6 are required, so amending the last conf file to:

set title ' sarq load avg'
set autoscale
set terminal png large size 1280,960
set xdata time
set xlabel 'Time of Day' font 'Arial,14' offset -2,0
set ylabel 'sarq load avgs' font 'Arial,12'
set timefmt '%H:%M:%S'
set xtics rotate
set format x "%H:%M:%S"
set output 'sarq.png'
plot 'sarq.txt' using 1:4 with linespoints, 'sarq.txt' using 1:5 with linespoints, 'sarq.txt' using 1:6 with linespoints

gnuplot sarq.gnu

sarq.png

This shows the time between shutdowns with a total load avg range between 0 and 3 max. The larger red spikes are the 1 minute avg fluctuations, with the 5 and 15 min plots being more "compressed" over time relatively. A more easily read plot may be using only the figures from today's boot period to remove the long straight links:

sarq.pngAgain, the autoscaling is impressive. The good thing about doing these exercises s it makes you think about the workings of the system and what is actually happening so that when you glance at load avgs in future you have a better understanding of what they mean - general trends toward higher loads or lower - is the PC getting busier or quieter over time? What processes/user actions are responsible for those changes etc...From the above plot you can see the 15 min avg telling that the laptop was busiest around 5.30pm, getting quieter after.

Comments are closed.

Post Navigation