stevepedwards.com/DebianAdmin linux mint IT admin tips info

Awk/Perl/SED Notepad

 

Create yourself a copy of the awk/sed man page - it's easier to read:

man -t awk | ps2pdf - > awkman.pdf

man -t sed | ps2pdf - > sedman.pdf

From sed & awk by OReilly - older kernel/system example problems...examples don't copy in lines, but columns often, and major parts of examples are assumed knowledge/steps, so missing...etc...so you'll have to work it out as you go...

A better intro is Shott's:

http://linuxcommand.org/lc3_adv_awk.php

VIM - select all and delete is :%d     select all and copy is :%y   this goes into vim mem so can be pasted to another vim doc with Shift P.

FS: field separator, default = 1 whitespace

OFS: output FS, default = 1 whitespace

NF: no. of fields

NR: no. of records

RS: record separator, default = newline

ORS: output record separator, default = newline

FNR: filename record

-e program-text --source program-text
Use program-text as AWK program source code. This option allows the easy inter‐mixing of library functions (used via the -f and --file options) with source code entered on the command line.

Pattern/Action...


By default, awk runs a loop. It can have parameters set before and/or after the loop body with BEGIN and END:

awkloop.png

"Of these three parts, the main input loop or "what happens during processing" is where most of the work
gets done. Inside the main input loop, your instructions are written as a series of pattern/action
procedures. A pattern is a rule for testing the input line to determine whether or not the action should be
applied to it. The actions, as we shall see, can be quite complex, consisting of statements, functions, and
expressions.
The main thing to remember is that each pattern/action procedure sits in the main input loop, which
takes care of reading the input line. The procedures that you write will be applied to each input line, one
line at a time."

So, you need to KNOW re the awk loop above to complete the first line scan...if you want number of chars in a line e.g. by setting the FS (Field Separator) to null, when there are no white spaces between chars:

cat hashes.txt
############################################################
############################################################
The number of hashes won't be read correctly for the first line, only the second by default loop. This could be a prob in a program eh? A headscratcher debugging it!

awk 'FS = ""; {print NF}' hashes.txt
0
60

You get round this by using a BEGIN function before the main awk loop body starts to allow the first line to be read before the loop body:

awk 'BEGIN {FS=""} {print NF }' hashes.txt
60
60

Awk seems powerful as it is compact, quite easy to read, and generally gets a lot done in a relatively short script, compared to other langs, e,g. C. For example, run Shott's loop here to see:

ls -l / | awk '
$1 ~ /^-/ {t["Regular Files"]++}
$1 ~ /^d/ {t["Directories"]++}
$1 ~ /^l/ {t["Symbolic Links"]++}
END {for (i in t) print i ":\t" t[i]}
'
Symbolic Links: 2
Directories: 24
Regular Files: 1

Basic Behaviour

All awk variables are initialized to zero so this script works immediately, as 0 is inc'd to 1 at start:

BEGIN {
do {
++x

print x
} while ( x <= 4 )
}

awk -f do.awk
1
2
3
4
5

awk -e 'BEGIN {do {++x; print x} while ( x <= 4 )}'
1
2
3
4
5

Count Blank Lines (bad idea to make 6 blanks and 6 text, as seen later!!)

cat -n text
1
2 not a blank line
3
4 not a blank line
5
6 not a blank line
7
8 not a blank line
9
10 not a blank line
11 not a blank line
12

awk '{print}' text

not a blank line

not a blank line

not a blank line

not a blank line

not a blank line
not a blank line

awk '{ print NR }' text
1
2
3
4
5
6
7
8
9
10
11
12

awk '{ print NR, $0 }' text
1
2 not a blank line
3
4 not a blank line
5
6 not a blank line
7
8 not a blank line
9
10 not a blank line
11 not a blank line
12

awk '{ print NF, $0 }' text
0
4 not a blank line
0
4 not a blank line
0
4 not a blank line
0
4 not a blank line
0
4 not a blank line
4 not a blank line
0

vi awk.scr

# test for integer,
#/[0-9]+/{ print "That is an integer" }
#/[A-Za-z]+/ { print "This is a string" }
/^$/ { print "This is a blank line." }
#/^$/ {print x += 1}

awk -f awk.scr text
This is a blank line.
This is a blank line.
This is a blank line.
This is a blank line.
This is a blank line.
This is a blank line.

awk '/^$/ { print NR, "This is a blank line." }' text
1 This is a blank line.
3 This is a blank line.
5 This is a blank line.
7 This is a blank line.
9 This is a blank line.
12 This is a blank line.

This matches printable chars between start and end of line and counts each matching line processed

awk '/^$/ {print x += 1}' text
1
2
3
4
5
6

This matches each line start so counts all lines inc. blanks, same as NR.

awk '/^/ {print x += 1}' text;
1
2
3
4
5
6
7
8
9
10
11
12

This counts NF, so states  which have no fields, so show blank lines = 0 fields

awk ' {print NF} ' text;
0
4
0
4
0
4
0
4
0
4
4
0

Or, by much opposite logic, non blank lines using Shotts POSIX (or /[:alnum]/ ; /[:ascii]/ /[:print]/ /[:space]/ etc.)

awk '/[:alpha]/ { print NR, "This is a string" }' text
2 This is a string
4 This is a string
6 This is a string
8 This is a string
10 This is a string
11 This is a string

This can be confusing; you may think blank lines can have a newline at the start/end (they do!), but /^$/ above, and [:ctrl] below doesn't show that, yet Libre shows they do - 12 in all, so you MUST put the ^$ in quotes to see them, as newlines or ctrl chars don't work fully to show all 12:

newlines.png

awk '/'^$'/ {print " record found" }' text    same as     awk '/^/ {print " record found" }' text
record found
record found
record found
record found
record found
record found
record found
record found
record found
record found
record found
record found

awk '//' text

not a blank line

not a blank line

not a blank line

not a blank line

not a blank line
not a blank line

awk '/ /' text
not a blank line
not a blank line
not a blank line
not a blank line
not a blank line
not a blank line

blank lines only:

awk '/^$/ {print NR " record found" }' text
1 record found
3 record found
5 record found
7 record found
9 record found
12 record found

non blanks IDd by new line char :

awk '/'\n'/ { print NR " This is a string" }' text
2 This is a string
4 This is a string
6 This is a string
8 This is a string
10 This is a string
11 This is a string

non blanks IDd by non printable Ctrl char :

awk '/[:ctrl]/ { print NR " This is a string" }' text
2 This is a string
4 This is a string
6 This is a string
8 This is a string
10 This is a string
11 This is a string

via a regex:

awk '/[0-9a-zA-Z]/ { print NR " This is a string" }' text
2 This is a string
4 This is a string
6 This is a string
8 This is a string
10 This is a string
11 This is a string

blank line removal (not necessarily the same as white space removal and you need to be careful about what else may be removed - like } in code!!):

awk 'gsub(" ","")' < text
notablankline
notablankline
notablankline
notablankline
notablankline
notablankline

Part of a CSS style file with blank lines and {} :

cat style.css 

text-indent: -9999px;

background: url(images/menu-indicator-right.png) no-repeat;

/*border: solid 1px red;*/

}

Piping it thru gosub as above removes blank lines but also the closing }  from each section !!

cat style.css | awk 'gsub(" ","")'

.menu1ulul.sf-sub-indicator{
position:absolute;
width:7px;
height:15px;
top:8px;
right:15px;
text-indent:-9999px;
background:url(images/menu-indicator-right.png)no-repeat;
/*border:solid1pxred;*/

Can use : cat style.css | sed '/^[[:space:]]*$/d' OR  cat style.css | sed '/^.$/d'

.menu1 ul ul .sf-sub-indicator {
position: absolute;
width: 7px;
height: 15px;
top: 8px;
right: 15px;
text-indent: -9999px;
background: url(images/menu-indicator-right.png) no-repeat;
/*border: solid 1px red;*/
}

sed removes whitespace between words, but NOT blank lines with this:

sed 's/ //g' < text

notablankline

notablankline

notablankline

notablankline

notablankline
notablankline

Negation - a real world problem. Dir full of .c files with their binaries that are not +x so cannot run due to a past chmod error. ID all those with a .c extension, then negate the term to show only none .c files:

ls

floating sevendebug vars
floating.c sevendebug.c vars.c
full sidefx vowels
full1 sidefx.c vowels.c
full1.c sieve vsphere
full.c sieve1.py vsphere.c
graph sieve.c zero
graph.c Sieve.c zero.c
guess SievefromWeb

shows only the .c files:

ls | awk '/.c$/'

triang.c
tri.c
trisub.c
twiceterm.c
two.c
vars.c
vowels.c
vsphere.c
zero.c

now negate it to show only the exe files that need chmod +x

ls | awk '!/.c$/'

triang2
trisub
twiceterm
two
UnitsConverter
vars
vowels
vsphere
zero

Now they can all be +x:

chmod +x `ls | awk '!/.c$/' `

cexes.png


dstat -tc | awk '{if(NR>3) print $0 fflush()}' | tee dstatlive.txt (used with watch -n 1 gnuplot livedstat.gnu)

sadf -p -P 0,1,2,3 | grep user | grep cpu0 | awk '{print $3,$4,$8}'

cat data1.dat | awk '{print $3}' | sed ':a;N;$!ba;s/\n/; /g'  (creates 1 long line - comma delimits each line by removing \n, except last line )

sadf -p -P 0,1,2,3 | grep user | grep cpu0 | awk '{print $3,$4,$8}' | sed 's/:01//g' > data1.dat; gnuplot plot1.gnu; xdg-open userCPU.png (see gnu live data Posts)

sadf -p -P 0,1,2,3 | grep idle | grep cpu0 | awk '{print $3,$4,$8}' | sed 's/:01//g' > data1.dat; gnuplot plot1.gnu; mv -v userCPU.png idleCPU.png ; xdg-open idleCPU.png (see gnu live data Posts)

awk 'gsub(" ","")' < file.txt (removes whitespace)

cat 1_5.txt
1
2
3
4
5

AWK Subs

Sub a function for text:

awk 'gsub("blank",sqrt(2))' text
not a 1.41421 line
not a 1.41421 line
not a 1.41421 line
not a 1.41421 line
not a 1.41421 line
not a 1.41421 line

Replace all numbers with a char:

awk 'gsub("[0-9]","=")' 1_5.txt
=
=
=
=
=

cat ABABAB.txt
ABABAB

awk -e '{if (gsub(/A/,"B")) {print}}' < ABABAB.txt
BBBBBB

awk -e '{if (gsub(/B/," ")) {print}}' < ABABAB.txt
A A A

awk -e '{if (gsub(/B/,"")) {print}}' < ABABAB.txt
AAA

Subs one char for the whole record/string ...? Shows looping behaviour of awk. Useful? Maybe:

awk -e '{if (gsub(/B/,tolower($0) )) {print}}' < ABABAB.txt
AabababAabababAababab

awk -e '{print tolower($0)}' < ABABAB.txt
ababab

Again, note the loop effects and first line issue without a BEGIN:

awk -e '{print $0"\n",tolower($0)}' < ABABAB.txt
ABABAB
  ababab

awk -e '{print $0"\n", tolower($0)}' < ABABAB.txt | awk -e '{print $0"\n", toupper($0)}'
ABABAB
   ABABAB
   ababab
     ABABAB

Nothing from the first awk gets printed in the 2nd version..like the loop does not complete for 1st awk ?

cat ABABAB.txt | awk -e '{print tolower($0)}'
ababab

cat ABABAB.txt | awk  '{print tolower($0)}' | awk  '{print toupper($0)}'
ABABAB

awk -e '{if (gsub(/B/,"\v")) {print}}' < ABABAB.txt
A
  A
    A

awk -e '{if (gsub(/[A-Z]/, "1")) {print}}' < ABABAB.txt
111111

"Sed offers capabilities that seem a natural extension of interactive text editing. For instance, it offers a
search-and-replace facility that can be applied globally to a single file or a group of files. While you
would not typically use sed to change a term that appears once in a particular file, you will find it very
useful to make a series of changes across a number of files. Think about making 20 different edits in
over 100 files in a matter of minutes, and you get an idea of how powerful sed can be." E.G. Subs "In Place" for a mysql DB to change you web name for home server name globally - BACK IT UP first!! 

sed -i 's/www.stevepedwards.com/minimint/g' Downloads/backup-9.6.2016_19-37-11_steveped/mysql/steveped_debianadmin.sql

Or tr and perl are better to use when learning concepts, as sed and awk don't work in the same simple search and replace format for "\n" type chars in Mint.

This seems to be what paste -s uses anyway, so you could also use 1 white space " " instead of a tab \t, also:

arp -a | awk '{print $2}' | sed 's/^.//g' | sed 's/.$//g' | tr "\n" "\t"
192.168.1.2 192.168.1.7 192.168.1.1 192.168.1.3 192.168.1.8 192.168.1.4 192.168.1.5

arp -a | awk '{print $2}' | sed 's/^.//g' | sed 's/.$//g' | perl -pe 's/\n/\t/g'
192.168.1.2 192.168.1.7 192.168.1.1 192.168.1.3 192.168.1.8 192.168.1.4 192.168.1.5

sed and awk fail for this simple format requiring more cryptic parameters:

arp -a | awk '{print $2}' | sed 's/^.//g' | sed 's/.$//g' | sed 's/\n/\t/g'
192.168.1.2
192.168.1.7...

arp -a | awk '{print $2}' | sed 's/^.//g' | sed 's/.$//g' | sed '{:q;N;s/\n/ /g;t q}'
192.168.1.2 192.168.1.7 192.168.1.1 192.168.1.3 192.168.1.8 192.168.1.4 192.168.1.5

arp -a | awk '{print $2}' | sed 's/^.//g' | sed 's/.$//g' | awk 'gsub("\n","\t")'

no OP

arp -a | awk '{print $2}' | sed 's/^.//g' | sed 's/.$//g' | awk '{printf "%s ",$0} END {print ""}'
192.168.1.2 192.168.1.7 192.168.1.1 192.168.1.3 192.168.1.8 192.168.1.4 192.168.1.5

OR

arp -a | awk '{print $2}' | sed 's/^.//g' | sed 's/.$//g' | awk 'BEGIN {RS=""}{gsub(/\n/," ",$0); print $0}'
192.168.1.2 192.168.1.7 192.168.1.1 192.168.1.3 192.168.1.8 192.168.1.4 192.168.1.5

Perl evaluation/substitution of integers:

perl -pe 's/(\d+)$/"="x$1/e' 1_5.txt
=
==
===
====
=====

Oh look! It was right in front of me all the time above, adding print!! Dick...

perl -pe 's/(\d+)$/"="x$1/e; print "$1 "' 1_5.txt
1 =
2 ==
3 ===
4 ====
5 =====

Does it work for diff values on diff lines..? Yep!

cat 6_1.txt
6
5
4
3
2
1

perl -pe 's/(\d+)$/"="x$1/e; print "$1 "' 6_1.txt
6 ======
5 =====
4 ====
3 ===
2 ==
1 =

perl -pe 's/(\d+)$/"="x$1/e; print "$1 "' random.txt
90 ==========================================================================================
87 =======================================================================================
74 ==========================================================================
40 ========================================
90 ==========================================================================================
70 ======================================================================
46 ==============================================
10 ==========
35 ===================================
08 ========
1 =
0

Remember - load avg is divided by cores to get value per core (avg of 4 of a quad core = 4/4=100% utilization per core - PC blows up and you die...)

uptime
23:56:19 up 9:20, 5 users, load average: 0.88, 0.95, 0.81

perl -e 'while(1) {`uptime` =~ /average: ([\d.]+)/; printf("% 5s %s\n", $1, "#" x ($1 * 10)); sleep 3 }'

0.87 ########
0.87 ########
0.88 ########
0.97 #########
0.97 #########
1.21 ############
1.21 ############
1.11 ###########
1.10 ###########
1.10 ###########
1.10 ###########
1.10 ###########
1.01 ##########
0.93 #########
0.93 #########
0.85 ########
0.85 ########

For these types of single line histos, increase the multiplier for clearer proportions, accounting for decimal:

perl -e 'while(1) {`uptime` =~ /average: ([\d.]++)/; printf("% 5s %s\n", $1, "#" x ($1 * 100)); sleep 3 }'
0.91 ###########################################################################################
0.92 ############################################################################################
0.92 ############################################################################################
0.84 ####################################################################################
0.84 ####################################################################################
0.78 ##############################################################################
0.71 #######################################################################

Perl matches:

"." =~ /./ # Match

" " =~ /\h/ # Match, space is horizontal whitespace.

\d Match a decimal digit character.
\D Match a non-decimal-digit character.
\w Match a "word" character.
\W Match a non-"word" character.
\s Match a whitespace character.
\S Match a non-whitespace character.
\h Match a horizontal whitespace character.
\H Match a character that isn't horizontal whitespace.
\v Match a vertical whitespace character.
\V Match a character that isn't vertical whitespace.
\N Match a character that isn't a newline.
\pP, \p{Prop} Match a character that has the given Unicode property.
\PP, \P{Prop} Match a character that doesn't have the Unicode property

Trying to work out this cryptic perl lang...from the above, which takes a number from a matching column, from ONE line only from each loop, and multiplies the digit value by 10, then prints that number of # chars...So...

For a generic data file with 1 line only, without the while/sleep to match digits, amend the  "=~ /avg: ([\d.])/" ; to read the correct column digits. The $1, "#" x ($1 * 10)  - reads/prints column 1 (digit), then multiplies by 10, the "#" char, by the actual digit value, ignoring the decimal fractions:

cat load.txt
load average: 1.15, 0.92, 0.57

perl -e '{`cat load.txt` =~ /([\d.]+)/; printf("% 5s %s\n", $1, "#" x ($1 * 10)) }'
1.15 ###########

e.g:

perl -e '{`cat load.txt` =~ /$x ([\d.]+)/; printf("% 5s %s\n", $1, "#" x ($1 * 10)) }'
1.15 ###########

perl -pe '{}' 1_5.txt
1
2
3
4
5

Can't get it to do more than one line of the 5...unlike sed/awk, perl does not process multilines it seems...without while(<>)

awk '{print}' 1_5.txt | perl -e '{/^/ =~ /([\d.]+)/; printf("% 5s %s\n", $1, "#" x ($1 * 10)) }'
1 ##########

Aah...awk histo for cmd history - nice....

history | awk '{h[$2]++}END{for(i in h){print h[i],i|"sort -rn|head -20"}}' |awk '!max{max=$1;}{r="";i=s=60*$1/max;while(i-->0)r=r"#";printf "%15s %5d %s %s",$2,$1,r,"\n";}'
sudo 194 ############################################################
ls 151 ###############################################
cat 97 ##############################
fortune 57 ##################
perl 47 ###############
awk 41 #############
find 38 ############
du 35 ###########
vi 25 ########
rsync 23 ########
cd 23 ########
man 19 ######
crontab 17 ######
rm 16 #####
printf 16 #####
mplayer 14 #####
df 13 #####
for 11 ####
alias 10 ####
x11vnc 9 ###

awk '{print}' 1_5.txt | perl -e 'while(<>){ /^/=~/(\d)/; printf("% 5s %s\n", $1, "#" x ($1 * 10))}'
1 ##########
1 ##########
1 ##########
1 ##########
1 ##########


grades.txt

john 85 92 78 94 88
andrea 89 90 75 90 86
jasper 84 88 80 92 84

awk.scr

# average five grades
{ total = $2 + $3 + $4 + $5 + $6
avg = total / 5
print $1, avg }

awk -f awkscr grades.txt
john 87.4
andrea 86
jasper 85.6

change:

# average five grades
{print NR ".", $1, avg }
END {
print ""
print NR, "records processed." }

gives:

awk -f awkscr grades.txt
1. john
2. andrea
3. jasper

3 records processed.

make OFS a tab:

BEGIN { OFS = "\t" }
{print NR ".", $1, avg }
END {
print ""
print NR, "records processed." }

awk -f awkscr grades.txt
1.            john
2.            andrea
3.            jasper

3 records processed.

combine all grades file options:

BEGIN { OFS = "\t" }
{total = $2 + $3 + $4 + $5 + $6
avg = total / 5
print NR ".", $1, avg }
END {
print ""
print NR, "records processed." }

awk -f awkscr grades.txt

1.            john 87.4
2.            andrea 86
3.            jasper 85.6

3 records processed.

For file presidents.txt of text:

John Kennedy
Lyndon B. Johnson
Richard Milhouse Nixon
Gerald R. Ford
Jimmy Carter
Ronald Reagan
George Bush
Bill Clinton

If above awkscr is run:

awk -f awkscr presidents.txt
1. John 0
2. Lyndon 0
3. Richard 0
4. Gerald 0
5. Jimmy 0
6. Ronald 0
7. George 0
8. Bill 0

8 records processed.

Amending for removal of avgs and finding OFS number of fields:

BEGIN { OFS = "\t" }
END {
print NF, "fields present."
print NR, "records processed." }

awk -f awkscr presidents.txt
2          fields present.
8          records processed.

But are there only 2 real fields? No, as for the spaces between Lyndon B. Johnson, Ford and Milhouse - OFS has set the field, not the text content:

awk ' {print NF, $0}' presidents.txt

2 John Kennedy
3 Lyndon B. Johnson
3 Richard Milhouse Nixon
3 Gerald R. Ford
2 Jimmy Carter
2 Ronald Reagan
2 George Bush
2 Bill Clinton

You can find the text's no of fields for each record (which tells of a middle name or letter), with the last name (from the last field) with a total of records processed with just:

{print $NF
print NF, "fields present."
print NR, "records processed." }

awk -f awkscr presidents.txt
Kennedy
2 fields present.
1 records processed.
Johnson
3 fields present.
2 records processed.
Nixon
3 fields present.
3 records processed.
Ford
3 fields present.
4 records processed.
Carter
2 fields present.
5 records processed.
Reagan
2 fields present.
6 records processed.
Bush
2 fields present.
7 records processed.
Clinton
2 fields present.
8 records processed.

Or for explicit clarity:

awk ' {print $0" " NF " fields present" ; print NR, "records processed." }' presidents.txt
John Kennedy 2 fields present
1 records processed.
Lyndon B. Johnson 3 fields present
2 records processed.
Richard Milhouse Nixon 3 fields present
3 records processed.
Gerald R. Ford 3 fields present
4 records processed.
Jimmy Carter 2 fields present
5 records processed.
Ronald Reagan 2 fields present
6 records processed.
George Bush 2 fields present
7 records processed.
Bill Clinton 2 fields present
8 records processed.

Instead of having all the information on one line, the person's name is on one line, followed by the company's name on the next line and so on. Here's a sample record:
John Robinson
Koren Inc.
978 Commonwealth Ave.
Boston
MA 01760
696-0987
This file has 6 records, or you can look at it as a single user record that has six fields where a blank line separates each field. To process this data, we can specify a multi line record by defining the field separator to be a newline, represented as "\n", and set the record separator to the empty string (RS = "") which stands for a blank line. print gives the 1st and last records ($NF=$6 here).
BEGIN { FS = "\n"; RS = "" }
We can print the first and last fields using the following script:
# block.awk - print first and last fields
# $1 = name; $NF = phone number

BEGIN { FS = "\n"; RS = "" }
{ print $1, $NF }

vi multiline_record.txt

John Robinson
Koren Inc.
978 Commonwealth Ave.
Boston
MA 01760
696-0987

vi block.awk

# block.awk - print first and last fields
# $1 = name; $NF = phone number

BEGIN { FS = "\n"; RS = "" }
{ print $1, $NF }

awk -f block.awk multiline_record.txt    

BEGIN { FS = "\n"; RS = "" }
{ print $1, $NF }

John Robinson 696-0987

awk -e '{ FS = "\n"; ORS = " "; print }' multiline_record.txt
John Robinson Koren Inc. 978 Commonwealth Ave. Boston MA 01760 696-0987 stevee@AMDA8 ~/Awk $


A key ability of awk is to set field separators of your choice, which then determine how many fields the file is read as. If a file (names.txt) contains:

John Rink,Boob Inc,2 Tities street,Boston,MA 666-555-1111
John Robinson,Koren Inc.,978 4th Ave.,Boston,MA 01760,696-0987
Phyllis Chapman,GVE Corp.,34 Sea Drive,Amesbury,MA 01881,879-0900
Phyllis Man,GVE Corp.,34 Sea Drive,Amesbury,MA 707-724-0000
Phil Thap,GVE Corp.,34 Sea Drive,Amesbury,MA (707) 724-0000
(Phyllis Chap,GVE Corp.,34 Sea Drive,Amesbury,MA (707)724-0000
Phyllis Apeman,GVE Corp.,34 Sea Drive,Amesbury,MA 1-707-724-0000
Phyllis Chipmunk,GVE Corp.,34 Sea Drive,Amesbury,MA1 707-724-0000
Phyllis Shaman,GVE Corp.,34 Sea Drive,Amesbury,MA 1(707)724-0000

You could choose to separate on say, commas, TABS (\t), single letters, any number range [0-9] etc.

This decides which column number would need stating to print out by amending the script column $x, say for script phonelist.awk, with prior commented out examples, but now separates by a single space " " so "creates" up to 7 columns, depending on record (line) content that print $6 will show from the "names" data file:

vi phonelist.awk

# phonelist.awk -- print name and phone number.
# input file -- name, company, street, city, state and zip,phone
#BEGIN { FS = "," }
# comma-delimited fields
#{ print $1 ", " $6 }
#$5 !~ /MA/ { print $1 ", " $6 }
#$6 !~ /1?(-| )?\(?[0-9]+\)?( |-)?[0-9]+-[0-9]+/

BEGIN { FS = " " } # delimited fields
{ print $1,": " $NF }
END {
print ""
print NR, "records processed. \n" }

using phonelist.awk, becomes:

awk -f phonelist.awk names.txt
John : 666-555-1111
John : 01760,696-0987
Phyllis : 879-0900
Phyllis : 707-724-0000
Phil : 724-0000
(Phyllis : (707)724-0000
Phyllis : 1-707-724-0000
Phyllis : 1707-724-0000
Phyllis : 1(707)724-0000

9 records processed.

PRINT FIRST/LAST RECORD (line) ONLY:

efficient - doesn't read whole file:

awk 'NR==1 {print; exit}' presidents.txt

John Kennedy

awk '{if(NR<=1) print}' presidents.txt
John Kennedy

awk '{if(NR<2) print}' presidents.txt
John Kennedy

awk '{if(NR>7) print}' presidents.txt
Bill Clinton

awk '{if(NR>=8) print}' presidents.txt
Bill Clinton

awk 'END{print}' presidents.txt
Bill Clinton

awk -f phonelist.awk names.txt
John, 666-555-1111
John, 01760,696-0987
Phyllis, 01881,879-0900
Phyllis, 707-724-0000
Phil, (707)
(Phyllis, (707)724-0000
Phyllis, 1-707-724-0000
Phyllis, 707-724-0000
Phyllis, 1(707)724-0000

9 records processed.

It you chose a TAB (\t) then only 2 columns can be created for output as the phone numbers are tab delimited in this file:

BEGIN { FS = "\t" } # comma-delimited fields
{ print $2 }
END {
print ""
print NR, "records processed." }

awk -f phonelist.awk names
666-555-1111
01760,696-0987
01881,879-0900
707-724-0000
(707) 724-0000
(707)724-0000
1-707-724-0000
707-724-0000
1(707)724-0000

9 records processed.

For the example of names.txt, which has FS = "," a comma, find how many fields each record has (count the commas and add 1 eg 5 commas = 6 columns ) :

cat names.txt
John Rink,Boob Inc,2 Tities street,Boston,MA,666-555-1111
John Robinson,Koren Inc.,978 4th Ave.,Boston,MA,01760,696-0987
Phyllis Chapman,GVE Corp.,34 Sea Drive,Amesbury,MA,01881,879-0900
Phyllis Man,GVE Corp.,34 Sea Drive,Amesbury,MA,707-724-0000
Phil Thap,GVE Corp.,34 Sea Drive,Amesbury,MA,(707) 724-0000
(Phyllis Chap,GVE Corp.,34 Sea Drive,Amesbury,MA,(707)724-0000
Phyllis Apeman,GVE Corp.,34 Sea Drive,Amesbury,MA,1-707-724-0000
Phyllis Chipmunk,GVE Corp.,34 Sea Drive,Amesbury,MA,1707-724-0000
Phyllis Shaman,GVE Corp.,34 Sea Drive,Amesbury,MA,1(707)724-0000

Combining presidents example from above and showing no. of space separated fields (space is default):

awk ' {print $0"\n" NF " fields present" ; print NR, "records processed." }' names.txt
John Rink,Boob Inc,2 Tities street,Boston,MA,666-555-1111
5 fields present
1 records processed.
John Robinson,Koren Inc.,978 4th Ave.,Boston,MA,01760,696-0987
5 fields present
2 records processed.
Phyllis Chapman,GVE Corp.,34 Sea Drive,Amesbury,MA,01881,879-0900
5 fields present
3 records processed.
Phyllis Man,GVE Corp.,34 Sea Drive,Amesbury,MA,707-724-0000
5 fields present
4 records processed.
Phil Thap,GVE Corp.,34 Sea Drive,Amesbury,MA,(707) 724-0000
6 fields present
5 records processed.
(Phyllis Chap,GVE Corp.,34 Sea Drive,Amesbury,MA,(707)724-0000
5 fields present
6 records processed.
Phyllis Apeman,GVE Corp.,34 Sea Drive,Amesbury,MA,1-707-724-0000
5 fields present
7 records processed.
Phyllis Chipmunk,GVE Corp.,34 Sea Drive,Amesbury,MA,1707-724-0000
5 fields present
8 records processed.
Phyllis Shaman,GVE Corp.,34 Sea Drive,Amesbury,MA,1(707)724-0000
5 fields present
9 records processed.

Made clearer if I replace commas with comma and space for first record and try again:

John Rink, Boob Inc, 2 Tities street, Boston, MA, 666-555-1111

awk ' {print $0"\n" NF " fields present" ; print NR, "records processed." }' names.txt | head -3
John Rink, Boob Inc, 2 Tities street, Boston, MA, 666-555-1111
10 fields present
1 records processed.

Lesson there is keep clear records with a ", " or ": " or similar for ease of NR. Names will be auto split to first and last name fields by a space anyway too. For CSV files, need to set FS or OFS =","

awk ' {FS="," ; print $0"\n" NF " fields present" ; print NR, "records processed." }' names.txt
John Robinson,Koren Inc.,978 4th Ave.,Boston,MA,01760,696-0987
7 fields present

Add lines from file1 to file 2 example.

- remember - need to feed to > newfile.txt for a new copy, and remove the n from cat to remove the numbers:

awk 'NR%2 == 1 {getline f2 < "presidents.txt"; print f2} 1' names.txt | cat -n

1 John Kennedy
2 John Rink, Boob Inc, 2 Tities street, Boston, MA, 666-555-1111
3 John Robinson, Koren Inc., 978, 4th Ave., Boston, MA, 01760,696-0987
4 Phyllis Chapman, GVE Corp., 34 Sea Drive, Amesbury, MA, 01881 879-0900
5 Phyllis Man, GVE Corp., 34 Sea Drive, Amesbury, MA, 707-724-0000
6 Phil Thap, GVE Corp., 34 Sea Drive, Amesbury, MA, (707) 724-0000
7 (Phyllis Chap, GVE Corp., 34 Sea Drive, Amesbury, MA, (707)724-0000
8 Phyllis Apeman, GVE Corp., 34 Sea Drive, Amesbury, MA, 1-707-724-0000
9 Phyllis Chipmunk, GVE Corp., 34 Sea Drive, Amesbury, MA, 1707-724-0000
10 Lyndon B. Johnson
11 Phyllis Shaman, GVE Corp., 34 Sea Drive, Amesbury, MA, 1(707)724-0000

You have to supply the minimum NR = 2, so line record 1 gets read. This is proved by setting it to 1, as it does not complete line 1 read, so feeds nothing to file 2:

awk 'NR%1 == 1 {getline f2 < "presidents.txt"; print f2} 1' names.txt | cat -n
1 John Rink, Boob Inc, 2 Tities street, Boston, MA, 666-555-1111
2 John Robinson, Koren Inc., 978, 4th Ave., Boston, MA, 01760,696-0987....

~ Matches - search for a string in a specific field and if it matches, print xxx..

awk ' { print $0"\n" NF " fields present" }' names.txt
John Rink, Boob Inc, 2 Tities street, Boston, MA, 666-555-1111
10 fields present
John Robinson, Koren Inc., 978, 4th Ave., Boston, MA, 01760,696-0987
10 fields present
Phyllis Chapman, GVE Corp., 34 Sea Drive, Amesbury, MA, 01881 879-0900
11 fields present
Phyllis Man, GVE Corp., 34 Sea Drive, Amesbury, MA, 707-724-0000
10 fields present
Phil Thap, GVE Corp., 34 Sea Drive, Amesbury, MA, (707) 724-0000
11 fields present
(Phyllis Chap, GVE Corp., 34 Sea Drive, Amesbury, MA, (707)724-0000
10 fields present
Phyllis Apeman, GVE Corp., 34 Sea Drive, Amesbury, MA, 1-707-724-0000
10 fields present
Phyllis Chipmunk, GVE Corp., 34 Sea Drive, Amesbury, MA, 1707-724-0000
10 fields present
Phyllis Shaman, GVE Corp., 34 Sea Drive, Amesbury, MA, 1(707)724-0000
10 fields present

awk '$4 ~/Inc/ {print $1,$2,$4 " " $10}' names.txt
John Rink, Inc, 666-555-1111
John Robinson, Inc., 01760,696-0987

grep Inc names.txt
John Rink, Boob Inc, 2 Tities street, Boston, MA, 666-555-1111
John Robinson, Koren Inc., 978, 4th Ave., Boston, MA, 01760,696-0987


The cheque book deduction example - from above, one liners start getting too long, so you need to program with scripts, and comment addition is compulsory as for any script. It does a lot with just 2 functions:

cat cheques.awk
# checkbook.awk
BEGIN { FS = "\t" }
#1 Expect the first record to have the starting balance.
NR == 1 { print "Beginning Balance: \t" $1
balance = $1
next
# get next record and start over
}
#2 Apply to each check record, subtracting amount from balance
{
print $1, $2, $3
print balance -= $3
}

cat spent.txt
1000
125 Market 125.45
126 Hardware Store 34.95
127 Video Store 7.45
128 Book Store 14.32
129 Gasoline 16.10

This deducts the spends from a 1000 balance:

awk -f cheques.awk spent.txt
Beginning Balance: 1000
125 Market 125.45
874.55
126 Hardware Store 34.95
839.6
127 Video Store 7.45
832.15
128 Book Store 14.32
817.83
129 Gasoline 16.10
801.73

This can be tidied by making the current balance clearer in line with the start balance:

# checkbook.awk
BEGIN { FS = "\t" }
#1 Expect the first record to have the starting balance.
NR == 1 { print "Beginning Balance: \t" $1
balance = $1
next
# get next record and start over
}
#2 Apply to each check record, subtracting amount from balance
{
print $1, $2, $3
print "Current Balance: \t", balance -= $3
}

awk -f cheques.awk spent.txt
Beginning Balance: 1000
125 Market 125.45
Current Balance: 874.55
126 Hardware Store 34.95
Current Balance: 839.6
127 Video Store 7.45
Current Balance: 832.15
128 Book Store 14.32
Current Balance: 817.83
129 Gasoline 16.10
Current Balance: 801.73

awk '{if (NR>0) print}' names.txt
John Rink, Boob Inc, 2 Tities street, Boston, MA, 666-555-1111
John Robinson, Koren Inc., 978, 4th Ave., Boston, MA, 01760,696-0987
Phyllis Chapman, GVE Corp., 34 Sea Drive, Amesbury, MA, 01881 879-0900
Phyllis Man, GVE Corp., 34 Sea Drive, Amesbury, MA, 707-724-0000
Phil Thap, GVE Corp., 34 Sea Drive, Amesbury, MA, (707) 724-0000
(Phyllis Chap, GVE Corp., 34 Sea Drive, Amesbury, MA, (707)724-0000
Phyllis Apeman, GVE Corp., 34 Sea Drive, Amesbury, MA, 1-707-724-0000
Phyllis Chipmunk, GVE Corp., 34 Sea Drive, Amesbury, MA, 1707-724-0000
Phyllis Shaman, GVE Corp., 34 Sea Drive, Amesbury, MA, 1(707)724-0000

You can print last record only now you know how many there are (9):

awk '{if (NR>8) print}' names.txt
Phyllis Shaman, GVE Corp., 34 Sea Drive, Amesbury, MA, 1(707)724-0000

Operator Descriptions
|| Logical OR
&& Logical AND
! Logical NOT

awk '{if (NF == 10 && NR > 0) print}' names.txt
John Rink, Boob Inc, 2 Tities street, Boston, MA, 666-555-1111
John Robinson, Koren Inc., 978, 4th Ave., Boston, MA, 01760,696-0987
Phyllis Man, GVE Corp., 34 Sea Drive, Amesbury, MA, 707-724-0000
(Phyllis Chap, GVE Corp., 34 Sea Drive, Amesbury, MA, (707)724-0000
Phyllis Apeman, GVE Corp., 34 Sea Drive, Amesbury, MA, 1-707-724-0000
Phyllis Chipmunk, GVE Corp., 34 Sea Drive, Amesbury, MA, 1707-724-0000
Phyllis Shaman, GVE Corp., 34 Sea Drive, Amesbury, MA, 1(707)724-0000

awk '{if (NF == 11 && NR > 0) print}' names.txt
Phyllis Chapman, GVE Corp., 34 Sea Drive, Amesbury, MA, 01881 879-0900
Phil Thap, GVE Corp., 34 Sea Drive, Amesbury, MA, (707) 724-0000

Precedence and ~ Matches

The names.txt file records have 10 or 11 fields:

awk '{print NF}' names.txt
10
10
11
10
11
10
10
10
10

The first && option will find records with 11 or more fields - which there are, so TRUE - so prints anyway:

awk '{if (NR > 0 && NF >= 11) print}' names.txt
Phyllis Chapman, GVE Corp., 34 Sea Drive, Amesbury, MA, 01881 879-0900
Phil Thap, GVE Corp., 34 Sea Drive, Amesbury, MA, (707) 724-0000

The next option is as above, because OR is false - no records start with an X:

awk '{if ((NR > 0 && NF >= 11) || $1 ~ /X/) print}' names.txt
Phyllis Chapman, GVE Corp., 34 Sea Drive, Amesbury, MA, 01881 879-0900
Phil Thap, GVE Corp., 34 Sea Drive, Amesbury, MA, (707) 724-0000

BUT...if both AND and OR conditions are TRUE, then those files that fit either condition are printed - the file starting with a "(" also, which doesn't begin with a capital and has only 10 fields. The first 2 lines have 11 fields so are also TRUE.

awk '{if ((NR > 0 && NF >= 11) || $1 ~ /\(/) print}' names.txt
Phyllis Chapman, GVE Corp., 34 Sea Drive, Amesbury, MA, 01881 879-0900
Phil Thap, GVE Corp., 34 Sea Drive, Amesbury, MA, (707) 724-0000
(Phyllis Chap, GVE Corp., 34 Sea Drive, Amesbury, MA, (707)724-0000

For the OR operator to be precedent, either one or both of the && conditions must be false, and the OR true. The first letter of a record starts P (TRUE) && the 2nd letter starts with Q (FALSE) so first condition is FALSE. The 2nd condition (OR) is TRUE, because it's the only file that starts with the "("

awk '{if ($1 ~ /P/ && $2 ~ /Q/ || $1 ~ /\(/) print }' names.txt
(Phyllis Chap, GVE Corp., 34 Sea Drive, Amesbury, MA, (707)724-0000

If neither side are TRUE, nothing is printed:

awk '{if ($1 ~ /Z/ && $2 ~ /Z/ || $1 ~ /Z/) print }' names.txt

The logic is negated by the "!" so all records are printed:

awk '{if ($1 !~ /Z/ && $2 !~ /Z/ || $1 !~ /Z/) print }' names.txt
John Rink, Boob Inc, 2 Tities street, Boston, MA, 666-555-1111
John Robinson, Koren Inc., 978, 4th Ave., Boston, MA, 01760,696-0987
Phyllis Chapman, GVE Corp., 34 Sea Drive, Amesbury, MA, 01881 879-0900
Phyllis Man, GVE Corp., 34 Sea Drive, Amesbury, MA, 707-724-0000
Phil Thap, GVE Corp., 34 Sea Drive, Amesbury, MA, (707) 724-0000
(Phyllis Chap, GVE Corp., 34 Sea Drive, Amesbury, MA, (707)724-0000
Phyllis Apeman, GVE Corp., 34 Sea Drive, Amesbury, MA, 1-707-724-0000
Phyllis Chipmunk, GVE Corp., 34 Sea Drive, Amesbury, MA, 1707-724-0000
Phyllis Shaman, GVE Corp., 34 Sea Drive, Amesbury, MA, 1(707)724-0000

ls -l
-rw-r--r-- 1 stevee stevee 512 Aug 24 06:11 awk.scr
-rw-r--r-- 1 stevee stevee 157 Aug 23 19:28 block.awk
-rw-r--r-- 1 stevee stevee 240 Aug 21 20:29 blocklist.awk
-rw-r--r-- 1 stevee stevee 293 Aug 23 19:28 cheques.awk
-rw-r--r-- 1 stevee stevee 64 Aug 22 20:52 grades.txt
-rw-r--r-- 1 stevee stevee 72 Aug 23 17:28 multiline_record.txt
-rw-r--r-- 1 stevee stevee 612 Aug 24 08:17 names.txt
-rw-r--r-- 1 stevee stevee 368 Aug 22 23:36 phonelist.awk
-rw-r--r-- 1 stevee stevee 121 Aug 23 16:40 presidents.txt
-rw-r--r-- 1 stevee stevee 109 Aug 23 19:26 spent.txt
-rw-r--r-- 1 stevee stevee 144 Aug 24 07:24 text

Creating running total of file bytes from ls cmd using script "fls.awk", making it exe and starting with;

ls -l $* | awk '{print $5, "\t", $9}'

512 awk.scr
157 block.awk
240 blocklist.awk
293 cheques.awk
64 grades.txt
72 multiline_record.txt
612 names.txt
368 phonelist.awk
121 presidents.txt
109 spent.txt
144 text

./fls.awk
512 awk.scr
157 block.awk
240 blocklist.awk
293 cheques.awk
38 fls.awk
64 grades.txt
72 multiline_record.txt
612 names.txt
368 phonelist.awk
121 presidents.txt
109 spent.txt
144 text

./fls.awk *txt
64 grades.txt
72 multiline_record.txt
612 names.txt
121 presidents.txt
109 spent.txt

Adding to script fls.awk to become:

ls -l $* | awk '{print $5, "\t", $9}'
{
sum += $5
++filenum
print $5, "\t", $9
}
END { print "Total: ", sum, "bytes (" filenum " files)" }

This is an example of the bad proof reading - you work it out for yourself - I got it to work as:

where fls.awk is:
BEGIN { print "BYTES", "\t", "FILE" }
{print $5, "\t", $9
sum += $5
+filenum}
END { print "Total: ", sum, "bytes (" filenum NR " files)" }

ls -l | awk -f fls.awk
BYTES FILE

512 awk.scr
157 block.awk
240 blocklist.awk
293 cheques.awk
154 fls.awk
64 grades.txt
72 multiline_record.txt
612 names.txt
368 phonelist.awk
121 presidents.txt
109 spent.txt
144 text
Total: 2846 bytes (13 files)

Checked with calc and it correct. Needs formatting better - Total is wrong place etc.:

BEGIN {print "BYTES", "\t", "FILE" }
{print $5, "\t", $9
sum += $5
+filenum}
END {print sum,":Bytes ("filenum NR," files)" }

now put all this in another exe..? The awk scr has to always be in the same dir as the sh script..hacky..! Needs to be written as a full one-liner in a sh script sometime...but works for now.

vi filesum.sh

ls -l | awk -f fls.awk

chmod u+x filesum.sh

./filesum.sh *
BYTES FILE

512 awk.scr
157 block.awk
240 blocklist.awk
293 cheques.awk
23 filesum.sh
140 fls.awk
64 grades.txt
72 multiline_record.txt
612 names.txt
368 phonelist.awk
121 presidents.txt
109 spent.txt
144 text
2855 :Bytes (14 files)

It even correlates! with

du -bc *
512 awk.scr
157 block.awk
240 blocklist.awk
293 cheques.awk
23 filesum.sh
140 fls.awk
64 grades.txt
72 multiline_record.txt
612 names.txt
368 phonelist.awk
121 presidents.txt
109 spent.txt
144 text
2855 total

Formatting print - kernel very diff from PDF output - experiment..

printf '|\n'
|

printf '("|")\n'
("|")

awk sh as a grep search of $1:

vi acronyms.txt

BASIC Beginner's All-Purpose Symbolic Instruction Code
CICS Customer Information Control System
COBOL Common Business Oriented Language
DBMS Data Base Management System
GIGO Garbage In, Garbage Out
GIRL Generalized Information Retrieval Language
AWK Aho Weinberger Kernighan
PERL Practical Extraction and Reporting Language
FORTRAN Formula Translation

vi search.sh

#! /bin/sh
# assign shell's $1 to awk search variable
awk '$1 ~ search' search=$1 acronyms.txt

./search.sh GI 
GIGO Garbage In, Garbage Out
GIRL Generalized Information Retrieval Language

./search.sh a

./search.sh A
BASIC Beginner's All-Purpose Symbolic Instruction Code
AWK Aho Weinberger Kernighan
FORTRAN Formula Translation

./search.sh AW
AWK Aho Weinberger Kernighan

for loop to read word of each field at a time:

vi awkfor.awk

{for ( i = 1; i <= NF; i++ )
print $i}

awk -e '{for ( i = 1; i <= NF; i++ ) print $i }' acronyms.txt

BASIC
Beginner's
All-Purpose
Symbolic
Instruction
Code
CICS
Customer
Information
Control
System
COBOL
Common
Business
Oriented
Language
DBMS
Data
Base
Management
System
GIGO
Garbage
In,
Garbage
Out
GIRL
Generalized
Information
Retrieval
Language

This can be used as a basis for statistics word count example:

awk -e '{for ( i = 1; i <= NF; i++ ) print $i }' acronyms.txt | sort | uniq -c
1 All-Purpose
1 Base
1 BASIC
1 Beginner's
1 Business
1 CICS
1 COBOL
1 Code
1 Common
1 Control
1 Customer
1 Data
1 DBMS
2 Garbage
1 Generalized
1 GIGO
1 GIRL
1 In,
2 Information
1 Instruction
2 Language
1 Management
1 Oriented
1 Out
1 Retrieval
1 Symbolic
2 System

Now count word distribution of some random text:

fortune | awk -e '{for ( i = 1; i <= NF; i++ ) print $i }' | sort | uniq -c

1 A:
2 an
1 and
1 between
1 difference
1 drunk.
2 Irish
1 less
1 One
1 Q:
1 the
1 wake?
1 wedding
1 What's

Or letter/char distribution in the sys dictionaries:

cat /usr/share/dict/british-english | awk -e '{FS="" ; for ( i = 1; i <= NF; i++ ) print $i }' | sort | uniq -c
26262 '
63180 a
1288 A
10 á
6 â
3 å
1 Å
4 ä
14293 b
1247 B
30435 c
1419 C
5 ç
27771 d
734 D
88096 e

596 E....

89882 s

That stat is misleading for letter s because this dic lists plurals/ownership : 1 youngsters,1 youngster's. The word count total is 99156 so many extra s.

You would need the total letter sample in the dictionary for stats.

awk -e '{for ( i = 1; i <= NF; i++ ) print $i }' < alphabet.txt | sort | uniq -c
1 a
1 b
1 c
1 d
1 e
1 f
1 g
1 h
1 i
1 j
1 k
1 l
1 m
1 n
1 o
1 p
1 q
1 r
1 s
1 t
1 u
1 v
1 w
1 x
1 y
1 z

Do reverse loop - count down for same result:

awk -e '{for ( i = NF; i >= NF; i-- ) print $i }' < alphabet.txt
a
b
c
d
e
f
g
h
i
j
k
l
m
n
o
p
q
r
s
t
u
v
w
x
y
z

Count no. of loops/lines processed:

awk -e '{for ( i = 1; i <= NF; i++ ) print $i, x += 1 }' < alphabet.txt
a 1
b 2
c 3
d 4
e 5
f 6
g 7
h 8
i 9
j 10
k 11
l 12
m 13
n 14
o 15
p 16
q 17
r 18
s 19
t 20
u 21
v 22
w 23
x 24
y 25
z 26

cat TheRavenV1_3.txt
Once upon a midnight dreary, while I pondered, weak and weary,
Over many a quaint and curious volume of forgotten lore—
While I nodded, nearly napping, suddenly there came a tapping,
As of some one gently rapping, rapping at my chamber door.
“’Tis some visitor,” I muttered, “tapping at my chamber door—
Only this and nothing more.”

Ah, distinctly I remember it was in the bleak December;
And each separate dying ember wrought its ghost upon the floor.
Eagerly I wished the morrow;—vainly I had sought to borrow
From my books surcease of sorrow—sorrow for the lost Lenore—
For the rare and radiant maiden whom the angels name Lenore—
Nameless here for evermore.

And the silken, sad, uncertain rustling of each purple curtain
Thrilled me—filled me with fantastic terrors never felt before;
So that now, to still the beating of my heart, I stood repeating
“’Tis some visitor entreating entrance at my chamber door—
Some late visitor entreating entrance at my chamber door;—
This it is and nothing more.”

cat TheRavenV1_3.txt | awk -e '{FS="" ; for ( i = 1; i <= NF; i++ ) print $i }' | sort | uniq -c
221
11 ,
4 ;
5 .
2 ’
3 “
3 ”
9 —
55 a
4 A
1 and
12 b
16 c
26 d
1 D
1 dreary,
91 e
1 E
13 f
2 F
20 g
32 h
45 i
7 I
3 k
26 l
2 L
33 m
1 midnight
54 n
1 N
64 o
2 O
1 Once
15 p
1 pondered,
1 q
67 r
37 s
2 S
62 t
4 T
14 u
1 upon
8 v
10 w
1 W
1 weak
1 weary,
1 while
15 y

Examining the history histograph function - how it works - can be applied elsewhere?:

history | awk '{h[$2]++}END{for(i in h){print h[i],i|"sort -rn|head -20"}}' |awk '!max{max=$1;}{r="";i=s=60*$1/max;while(i-->0)r=r"#";printf "o%15s %5d %s %s",$2,$1,r,"\n";}'
awk 419 ############################################################
perl 216 ###############################
cat 135 ####################
vi 39 ######
printf 34 #####
echo 26 ####
for 17 ###
budellmint 15 ###
./search.sh 14 ###
uptime 12 ##
ls 7 ##
df 5 #
history 4 #
hexdump 4 #
od 3 #
clear 3 #
90 3 #
87 3 #
74 3 #
70 3 #

Just see what happens piping RavenV3 in:

cat TheRavenV1_3.txt | awk '{h[$2]++}END{for(i in h){print h[i],i|"sort -rn|head -20"}}' |awk '!max{max=$1;}{r="";i=s=60*$1/max;while(i-->0)r=r"#";printf "%15s %5d %s %s",$2,$1,r,"\n";}'
the 2 ############################################################
some 2 ############################################################
I 2 ############################################################
2 ############################################################
upon 1 ##############################
this 1 ##############################
that 1 ##############################
of 1 ##############################
my 1 ##############################
me—filled 1 ##############################
many 1 ##############################
late 1 ##############################
it 1 ##############################
here 1 ##############################
each 1 ##############################
distinctly 1 ##############################

FIRST - 20 lines of 169 words in RavenV3

wc -l < TheRavenV1_3.txt

20
wc -w < TheRavenV1_3.txt
169

The function is actually two awk commands that can be treated separately - good!

history | awk '{h[$2]++}END{for(i in h){print h[i],i|"sort -rn|head -20"}}'
419 awk
216 perl
122 cat
39 vi...

The first above is easyish - loops to find how many occurrences of each command used, sorts high to low numerically, then heads it to only 20. (The full history record varies on setup and linux dist. Mint default list is:

history | wc -l
1000

The history awk function reads $2 for the command names and puts them in $1 for output:

history | tail -3
2208 wc -w < TheRavenV1_3.txt
2209 history | head -3
2210 history | tail -3

history | awk '{h[$2]++}END{for(i in h){print h[i],i|"sort -rn|head -20"}}' |awk '!max{max=$1;}{r="";i=s=60*$1/max;while(i-->0)r=r"#";printf "%15s %5d %s %s",$2,$1,r,"\n";}' | head -2
awk 419 ############################################################
perl 216 ###############################

I think it puts the contents of $2 of history list into an incrementing array named h[$2], so contains the command name, later ID'd by $2,h[$2], with the increment number ID'd by h[i] and i in the for loop: This can be shown by printing the contents of each section:

Array command list contents:

history | awk '{h[$2]++; print $2}' | uniq -c | head
5 cat
3 perl
1 uptime
1 vi
22 cat
1 sudo
2 perl
1 uptime
10 cat
10 uptime

Array Counter:
stevee@AMDA8 ~/Awk $ history | awk '{h[$2]++; print h[$2]}' | head -3
1
2
3

For loop counter contents:

history | awk '{h[$2]++}END{for(i in h){print h[i],i }}'
1 rm
419 awk
1 sudo
2 cd
3 clear
39 vi...

This also shows I have used only 45 different commands in the whole 1000 size list:

history | awk '{h[$2]++}END{for(i in h){print h[i],i | "sort -rn"}}' | wc -l
45

Summary Part1 - output of history command column $2 are stored in a numbered incrementing array, counted for occurrence of each command and labelled using a FOR loop, then numerically sorted by quantity and headed.

Part 2

awk '!max{max=$1;}{r="";i=s=60*$1/max;while(i-->0)r=r"#";printf "%15s %5d %s %s",$2,$1,r,"\n";}'

The next awk command can't be shown directly as it needs the prior awk output, but knowing the format of the first part that feeds it comprises only 2 columns, a different 2 column input can be fed to learn it's behaviour. Using cat to give a number list and a name file from prior above examples, to give text, input can be fed in:

cat -n acronyms.txt | awk '{print $1,$2}' | awk '!max{max=$1;}{r="";i=s=60*$1/max;while(i-->0)r=r"#";printf "%15s %5d %s %s",$2,$1,r,"\n";}'
BASIC 1 ############################################################
CICS 2 ########################################################################################################################
COBOL 3 ####################################################################################################################################################################################

Now it is obvious that the multiplier is 60*$1. If reduced to 1, it shows:

cat -n acronyms.txt | awk '{print $1,$2}' | awk '!max{max=$1;}{r="";i=s=1*$1/max;while(i-->0)r=r"#";printf "%15s %5d %s %s",$2,$1,r,"\n";}'
BASIC 1 #
CICS 2 ##
COBOL 3 ###
DBMS 4 ####
GIGO 5 #####
GIRL 6 ######
AWK 7 #######
PERL 8 ########
FORTRAN 9 ######### 

Summary Part 2 

An experienced programmer would have seen immediately that it is part 2 that is the generic useful part for histogram usage as this contains the multiplier to suit various number ranges, and reads a NOT MAX and MAX value and uses a dividend (somehow!) to scale the output according the maximum and minimum of the range (?? not sure bout that or how).

Point is, it can be used in isolation for many basic number lists - experimenting now...

It works correctly on the random.txt file, scaling automatically AND padding single digits properly in a terminal, with the 60 multiplier! Don't know how it works yet...(why 60?) This scales all below by 2/3...very clever...

cat random.txt | awk '{print $1,$2}' | awk '!max{max=$1;}{r="";i=s=60*$1/max;while(i-->0)r=r"#";printf "%15s %5d %s %s",$2,$1,r,"\n";}'
90 ############################################################
87 ##########################################################
74 ##################################################
40 ###########################
90 ############################################################
70 ###############################################
46 ###############################
10 #######
35 ########################
  8 ######
  1 #
  0

Factorial.awk:

--------------- --------------------------------------------------------

awk '

# factorial: return factorial of user-supplied number
BEGIN {
# prompt user; use printf, not print, to avoid the newline
printf("Enter number: ")
}
# check that user enters a number
$1 ~ /^[0-9]+$/ {
# assign value of $1 to number & fact
number = $1
if (number == 0)
fact = 1
else
fact = number
# loop to multiply fact*x until x = 1
for (x = number - 1; x > 1; x--)
fact *= x
printf("The factorial of %d is %g\n", number, fact)
# exit -- saves user from typing CRTL-D.
exit
}
# if not a number, prompt again.
{ printf("\nInvalid entry. Enter a number: ")
}' -

--------------------------------------------------------------------------

chmod +x factorial.awk

./factorial.awk
Enter number: 4
The factorial of 4 is 24

./factorial.awk
Enter number: f

Invalid entry. Enter a number: 0
The factorial of 0 is 1

vi grades.txt

mona 70 77 85 83 70 89
john 85 92 78 94 88 91
andrea 89 90 85 94 90 95
jasper 84 88 80 92 84 82
dunce 64 80 60 60 61 62
ellis 90 98 89 96 96 92

vi grades.awk

# grades.awk -- average student grades and determine
# letter grade as well as class averages.
# $1 = student name; $2 - $NF = test scores.
# set output field separator to tab.
BEGIN { OFS = "\t" }
# action applied to all input lines
{
# add up grades
total = 0
for (i = 2; i <= NF; ++i)
total += $i
# calculate average
avg = total / (NF - 1)
# assign student's average to element of array
student_avg[NR] = avg
# determine letter grade
if (avg >= 90) grade = "A"
else if (avg >= 80) grade = "B"
else if (avg >= 70) grade = "C"
else if (avg >= 60) grade = "D"
else grade = "F"
# increment counter for letter grade array
++class_grade[grade]
# print student name, average and letter gradeprint $1, avg, grade
}
# print out class statistics
END {
# calculate class average
for (x = 1; x <= NR; x++)
class_avg_total += student_avg[x]
class_average = class_avg_total / NR
# determine how many above/below average
for (x = 1; x <= NR; x++)
if (student_avg[x] >= class_average)
++above_average
else
++below_average
# print results
print ""
print "Class Average: ", class_average
print "At or Above Average: ", above_average
print "Below Average: ", below_average
# print number of students per letter grade
for (letter_grade in class_grade)
print letter_grade ":", class_grade
[letter_grade] | "sort"
}

awk -f grades.awk grades.test

Class Average: 83.4167
At or Above Average: 4
Below Average: 2
A: 2
B: 2
C: 2
D: 2

Doesn't add up and prog had to be "fixed" wrongly by me..Needs 2 @ avg and 4 above avge for 8 students total - Proof read the fucking thing OReilly!! This OP part is missing..

mona 79 C
john 88B
andrea 90.5A

jasper 85B
dunce 64.5D
ellis 93.5A

The keyword in is also an operator that can be used in a conditional expression to test that a subscript is a member of an array. The expression:
item in array

returns 1 if array[item] exists and 0 if it does not.

vi lookup.awk

--------------------

awk '# lookup -- reads local glossary file and prompts user
for query
#0
BEGIN { FS = "\t"; OFS = "\t"
# prompt user
printf("Enter a glossary term: ")
}
#1 read local file named glossary
FILENAME == "glossary.txt" {
# load each glossary entry into an array

entry[$1] = $2
next
}
#2 scan for command to exit program
$0 ~ /^(quit|[qQ]|exit|[Xx])$/ { exit }
#3 process any non-empty line
$0 != "" {
if ( $0 in entry ) {
# it is there, print definition
print entry[$0]
} else
print $0 " not found"
}
#4 prompt user again for another term
{
printf("Enter another glossary term (q to quit): ")
}' glossary.txt -

lookup.awk.png

-------------------------------------

cp -v acronyms.txt glossary.txt

chmod +x lookup.awk

./lookup.awk
Enter a glossary term: AWK
Aho Weinberger Kernighan
Enter another glossary term (q to quit): COBOL
Common Business Oriented Language
Enter another glossary term (q to quit): q

cat glossary.txt
ACID Atomicity, Consistency, Isolation, Durability
AWK Aho Weinberger Kernighan
BASIC Beginner's All-Purpose Symbolic Instruction Code
CICS Customer Information Control System
COBOL Common Business Oriented Language
DBMS Data Base Management System
GIGO Garbage In, Garbage Out
GIRL Generalized Information Retrieval Language
PERL Practical Extraction and Reporting Language
FORTRAN Formula Translation

Program Piping

Coz awk acts per line, it can alter output of a program piped thru it e.g. a prime prog prints primes per line, so awk can count the number of primes output per request:

./bash.sh
How many prime numbers ?: 4
1
2
3
5
7

./bash.sh | awk '{print NR, $0}'
How many prime numbers ?: 4
1 1
2 2
3 3
4 5
5 7