stevepedwards.com/DebianAdmin linux mint IT admin tips info

Experiment with Pipes, Redirection, Command Substitution and Variable Expansion

After you read this article, I'll let you think of interesting ways to use the above video example of piping data between terminals into other programs for immediate processing...if the commands are not clear; term 1 creates a fifo pipe in my home dir:

mkfifo fifo_pipe

term 2 cats that pipe to receive what comes out of that fifo_pipe then pipes that output directly into python :

cat fifo_pipe | python

A python source file is then streamed into term 1, and the program results are run and output in term 2;

cat Downloads/sieve1.py >> fifo_pipe

You REALLY have to understand the fundamentals first by playing with these commands yourself; they may not do what you expect. The examples are trial and error-ed to be sure what happens, with reference to Unix:A DB Approach – many things have changed from older versions – file safety for accidental overwrites for example.

There are 3 main data channels called file descriptors the shell uses for passing data to and from files, screen and commands: Standard Input, Standard Output and Standard Error, numbered 0,1 and 2 resp.

Certain commands can take input from a file and operate on it in some way. As all data on Unix like systems is a “file” of some sort containing a byte stream of one form or other (though an empty file has no bytes to stream), this byte stream can be the input to these certain commands, which the command (usually) performs an operation of some sort on that stream, then outputs it (usually) as another byte stream.

For example, cat can read a byte stream from “standard input”, which is data from the keyboard by default, then outputs it, unchanged, to standard output, which is by default, the terminal.

$ cat (Rtn)

this is the input stream and future output stream(Rtn)

this is the input stream and future output stream (CrtlD)

What is not so obvious at this point, is that the input from the keys are also being split or “tee'd” off to standard output at the same time whilst being buffered in a memory file somewhere by the shell, so you can read what you are typing, before cat actually gets it to spit it back out again. It is the shell that creates the memory buffer and handles the user input first, (while the cat command itself neither knows nor cares about this data handling) before passing it to the cat command itself via Standard Input (0), that cat then outputs to the terminal via Std Output (1). If input to cat is directed from a non-existent file then the error msg is output to the terminal via Std Error (2).

The channels used may be shown if numbered appropriately with a chevron and the & in some cases but behaviour like file creation with or without an explicit descriptor varies depending on the command and redirection type attempted.

The ls command will list my home directories with no explicit input, as the default input with no switches is all non hidden files and folders from my current home directory:

$ ls

750GB Dellmint Documents Music Public Videos

Cprogs Desktop Downloads Pictures Templates

If Std out is set explicitly, you get the same output as you would expect as the target is the screen:

$ ls >&1

750GB Dellmint Documents Music Public Videos

Cprogs Desktop Downloads Pictures Templates

You get the same for Stderror as it's default target is also the screen:

$ ls >&2

750GB Dellmint Documents Music Public Videos

Cprogs Desktop Downloads Pictures Templates

The difference between them is that the output from ls is Stout (1) by default, which you are now forcing out of the Stderror (2) channel.

Similarly, cat can also take stdinput from a file if it exists, and output Stdout (1) to a filename, which it will create if it does not exist already, but will NOT overwrite if it does:

stevee@AMD ~ $ cat testfile.txt

cat: testfile.txt: No such file or directory

The file does not exists yet so cat cannot read from it, so it outputs error text via stderror (2) to the screen; but it can create it using a stdout descriptor ">" ; an empty file is created to hold the following text when it is typed, then the input data is output to the file after Rtn  is pressed:

stevee@AMD ~ $ cat > testfile.txt

this text goes into the file but not to the screen (except via the shell)

Notice you did not see duplicate lines as above when Rtn hit, because std out was redirected into the newly created file, not the screen, by the right chevron >.

Now cat can open the file as stdin, as it exists:

stevee@AMD ~ $ cat testfile.txt

this text goes into the file but not to the screen

Above, the new file contents has gone first via stdin to cat, then via stdout to the screen again.

Mint does not allow the owner to overwrite his file as was older behaviour:

stevee@AMD ~ $ cat > testfile.txt

bash: testfile.txt: cannot overwrite existing file

It is IMPORTANT to realise command order and position, Director type (0,1 or 2) and chevron direction from left to right at this point from a human reading reference e.g. the default stdin to a command from a file, follows in a left to right direction as above – but the command is parsed by the shell last and then the formerly found file contents fed to it:

$ cat testfile.txt

this text goes into the file but not to the screen

but showing Director type, direction and order of director stream (right to left for stdin) as:

$ cat 0< testfile.txt

this text goes into the file but not to the screen

$ cat < testfile.txt

this text goes into the file but not to the screen

Commonly, the overall command line placement order left to right is cmd, input file, output file:

stevee@AMD ~ $ cat < testfile.txt > outfile.txt

Above, director streams are right to left stdin, then left to right for stdout.

$ cat outfile.txt

this text goes into the file but not to the screen

Likewise, the wordcount command wc, also takes stdin from right to left and outputs left to right, along with the name of the input file:

$ wc testfile.txt

1 11 51 testfile.txt

$ wc testfile.txt >&1

1 11 51 testfile.txt

BUT, notice the difference if stdin is set explicitly for the input file – this time the shell, not the command opens the file, so no filename is given:

wc 0< testfile.txt

1 11 51

This is important of you only want the data about the file contents ouput, without the name, as this may become the input for another command. This may be done using a pipe, if the command itself can accept stdout from a prior command as stdin e.g. counting the words in testfile.txt:

$ cat testfile.txt

this text goes into the file but not to the screen

stevee@AMD ~ $ cat testfile.txt | wc -w

11

The cat command can also comply:

$ wc -w testfile.txt

11 testfile.txt

stevee@AMD ~ $ wc -w testfile.txt | cat

11 testfile.txt

As the above output is to stdout, it follows you could keep sending it back to itself via different directors – stdout or stderror - so ultimately to the screen:

$ wc -w testfile.txt | cat >&1 >&1 >&1

11 testfile.txt

$ wc -w testfile.txt | cat >&2 >&1 >&2

11 testfile.txt

You may want the output of many commands sent to a file, or many txt files cat'd to one:

$ who;ls;date;cal

stevee tty8 2016-07-02 22:59 (:0)

stevee pts/1 2016-07-02 23:13 (:0)

stevee pts/2 2016-07-03 02:46 (:0)

RSYNCtest.txt

Sun Jul 3 02:56:58 BST 2016

July 2016

Su Mo Tu We Th Fr Sa

1 2

3 4 5 6 7 8 9

10 11 12 13 14 15 16

17 18 19 20 21 22 23

24 25 26 27 28 29 30

31

who;ls;date;cal > file.txt

You may create many files, say book chapters to fill then cat to one final:

touch chapter{1..3}.txt; ls

chapter1.txt chapter2.txt chapter3.txt

You can now append to each chapter via the keys (-), without seeing what is already in there though:

cat - >> chapter1.txt

this will go into chap 4 soon (CtrlD)

then append all the chapters to a final page

cat chapter* > chapter4

cat chapter4

this will go into chap 4 soon

To dump stderror into /dev/null or the “bit bucket” away from screen or script ouput:

cat abc

cat: abc: No such file or directory

cat abc 2> /dev/null

(i.e no stderror shown here)

You may have overlooked the obvious in the pipe example above, but a pipe does away with the step of creating an intermediary file before the next command can use that for stdin.

Strip a column from a file using a pipe:

cat chapter4 | awk '{print $7}'

soon

This piping feature was added to Unix in the 70's allowing chains of commands to be strung together to form a programming language of sorts – shell scripting – which became Windows batch files - where these cmd strings can be saved as an executable file and run if the command pipe is saved in a text file first, given a .sh extension usually for ID, made exe with chmod +x, then run using either of:

$ ./soonpipe.sh

soon

$ sh soonpipe.sh

soon

The above is fine when just the output of a command is the desired input to the next, but what if you want the result of the argument of the first command to be the next input? The first command has to complete first, before it's result is sent. It depends on the order of precedence the shell operates on special characters: According to my UnixDB book, it's:

Parsing white space; Variables; Command Sub; Redirection; Wildcard file expansion patterns (*?![]); Cmd PATH.

Create 3 files, page1-3; echo a word or phrase into each;

echo 'once' > page1

echo 'upon' > page2
echo 'a time' > page3

A wildcard is common in file expansion to simplify a file listing and is evaluated before the command is found from the above list e.g:

$ cat page1 page2 page3

once

upon

a time

becomes:

$ cat page*

once

upon

a time

OR:

cat page[1-3]

once

upon

a time

OR as ls also lists the files:

$ ls page*

page1 page2 page3

The above file list output, if expanded first by creating variables of of file, can become the serial arguments for cat - so the contents are read - not the file names:

$ cat $(ls page*)

once

upon

a time

BUT if not variables:

(ls page*)
page1 page2 page3

SO,

cat 0< (ls page*) does not work as you may think:

bash: syntax error near unexpected token `('

and neither does this, as it feeds the file's names not their content:

(ls page*) | cat

page1

page2

page3

Now below, you may see how the expansion worked in the PDF eBook Post for pdfunite to cat multiple separate PDFs into a final PDF:

pdfunite $(ls -tr stevepedwards.com-*) AllPosts.pdf

So what is the $ sign doing in that above example? It makes variables out of the results of the ls command. All words preceded by a $ are evaluated as variables, unless quoted or escaped”

You know that;

ls page*

page1 page2 page3

So you can make this operation a variable e.g:

list="ls page*"

The contents of this variable is:

$list

page1 page2 page3

This is not just text, it represents the files themselves as variables, so that the content of the apostrophes – ls and the expansion of page* - becomes the command and the list of files to be fed to cat. These, as variables, then becomes the argument for cat to give almost the desired outcome, except for the ls command itself, so not quite what is required due to the stderror part, but the the expansion works:

cat $list

cat: ls: No such file or directory

once

upon

a time

YET it does work using “” instead of () as a variable definition container, as it lists ALL files in the directory!! SEEMINGLY weird at first, and not wanted - the “page” prefix is ignored, listing ALL files present in the directory – yet explained above due to * being escaped by the brackets; as ls alone gives this listing:

list=(ls page*)

$list

page1 page2 page3 soonpipe.sh

YET: You can view this variable result by listing the variable list and expansion is correct!

ls $(ls page*)

page1 page2 page3

You can see why you need to experiment yourself!

This variable container output is only correct in context of a command operating on it – when fed to cat it expands correctly and gives the result you want – which is each file as a separate variable due to the * expansion:

cat $(ls page*)

once

upon

a time

This must be why this $(var) format works as stdin for pdfunite also, which seems to only be a pdf specific cat command.

This also provides the answer to the question of appending port numbers to the -p switch of nmap that I could not find an answer to last year - again the $ creates a variable of the single line, comma delimited BadPortsCommas.txt file and nmap runs it:

cat BadPortsCommas.txt

...65432,65530,65535

nmap 127.0.0.1 -p $(cat BadPortsCommas.txt)

Starting Nmap 6.40 ( http://nmap.org ) at 2016-07-04 11:20 BST
Nmap scan report for localhost (127.0.0.1)
Host is up (0.00049s latency).
Not shown: 786 closed ports
PORT STATE SERVICE
22/tcp open ssh
25/tcp open smtp
80/tcp open http
139/tcp open netbios-ssn
445/tcp open microsoft-ds

Nmap done: 1 IP address (1 host up) scanned in 0.12 seconds

Clever stuff...

It is simple enough for two piped commands to be predicted where you know the output of a cmd is what you want as the input for the next – left to right, but can become difficult to foresee as more commands are added – especially if command substitution is involved as it is not always possible to test for each result in isolation as seen above; you can't expand $(ls page*) alone for example – it has to be found in a command context - this makes building complex pipes, well...complex!

All you know for a pipeline is that the first command must use stdout and the last in the chain must take stdin for it to hope to work. There is no limit to the amount of cmds in a pipe line. Whether your pipe works for what you intend is another matter. “A lot is left to knowledge of filters, perseverance and imagination”.

This one is from my Linux Tutor Course - can you work out what it does as a combination of filters, command substitution, and pipe?:

awk -F: '{print $0}' /etc/group | grep 100.:

Note that awk is a line oriented programming language in itself, with "actions" defined in {}.  

{print $0} = {print} = {print $null} in this case, so the whole file (all records, or lines) is printed across all columns (fields), ignoring usual column delimiters such as " : ".

From man awk , the $ in awk's case does not mean a variable but is an operator for column numbers: $ = Field reference.

Try it – it's safe because awk - like many other commands that can be used in pipes, does not overwrite any result of operation to the source file.

But why is there output from this file and /etc/passwd but not from /etc/shadow?

What about this...? Ring any bells from the Mint Tutor Course or GUI admin explorations? 

awk -F: '{print $1}' /etc/group | sort | head -14

adm
audio
avahi
backup
bin
cdrom
crontab
daemon
dialout
dip
disk
fax
floppy
fuse

grpadmin.png

Some filters that accept both stdin and stdout so can be used in pipes are:

pr; head; tail; cut; paste; sort; uniq; nl; awk; sed; grep; join; tr

Read their man pages for use. e.g:

ls | pr -3 | head

prhead.png

Remove the headers and tail formatting blank lines from the pr command:

ls Documents/  | pr -3 | head | sed -n '1,2!p' | sed -n '2,3!p' | sed -n '5,7!p'

docssed.png

No doubt every linux admin has a large collection of general and specific pre-defined pipes in his toolkit, tested specifically for his system and duties.

Comments are closed.

Post Navigation