sed & awk

sed & awkSearch this book
Previous: 10.9 Invoking awk Using the #! SyntaxChapter 11Next: 11.2 Freely Available awks
 

11. A Flock of awks

Contents:
Original awk
Freely Available awks
Commercial awks
Epilogue

In the previous four chapters, we have looked at POSIX awk, with only occasional reference to actual awk implementations that you would run. In this chapter, we focus on the different versions of awk that are available, what features they do or do not have, and how you can get them.

First, we'll look at the original V7 version of awk. The original awk lacks many of the features we've described, so this section mostly describes what's not there. Next, we'll look at the three versions whose source code is freely available. All of them have extensions to the POSIX standard. Those that are common to all three versions are discussed first. Finally, we look at three commercial versions of awk.

11.1 Original awk

In each of the sections that follow, we'll take a brief look at how the original awk differs from POSIX awk. Over the years, UNIX vendors have enhanced their versions of original awk; you may need to write small test programs to see exactly what features your old awk has or doesn't have.

11.1.1 Escape Sequences

The original V7 awk only had "\t", "\n", "\"", and, of course, "\\". Most UNIX vendors have added some or all of "\b" and "\r" and "\f".

11.1.2 Exponentiation

Exponentiation (using the ^, ^=, **, and **= operators) is not in old awk.

11.1.3 The C Conditional Expression

The three-argument conditional expression found in C, "expr1 ? expr2 : expr3" is not in old awk. You must resort to a plain old if-else statement.

11.1.4 Variables as Boolean Patterns

You cannot use the value of a variable as a Boolean pattern.

flag { print "..." }

You must instead use a comparison expression.

flag != 0 { print "..." }

11.1.5 Faking Dynamic Regular Expressions

The original awk made it difficult to use patterns dynamically because they had to be fixed when the script was interpreted. You can get around the problem of not being able to use a variable as a regular expression by importing a shell variable inside an awk program. The value of the shell variable will be interpreted by awk as a constant. Here's an example:

$ cat awkro2
#! /bin/sh
# assign shell's $1 to awk search variable
search=$1
awk '$1 ~ /'"$search"'/' acronyms

The first line of the script makes the variable assignment before awk is invoked. To get the shell to expand the variable inside the awk procedure, we enclose it within single, then double, quotation marks.[1] Thus, awk never sees the shell variable and evaluates it as a constant string.

[1] Actually, this is the concatenation of single-quoted text with double-quoted text with more single-quoted text to produce one large quoted string. This trick was used earlier, in Chapter 6, Advanced sed Commands.

Here's another version that makes use of the Bourne shell variable substitution feature. Using this feature gives us an easy way to specify a default value for the variable if, for instance, the user does not supply a command-line argument.

search=$1
awk '$1 ~ /'"${search:-.*}"'/' acronyms

The expression "${search:-.*}" tells the shell to use the value of search if it is defined; if not, use ".*" as the value. Here, ".*" is regular-expression syntax specifying any string of characters; therefore, all entries are printed if no entry is supplied on the command line. Because the whole thing is inside double quotes, the shell does not perform a wildcard expansion on ".*".

11.1.6 Control Flow

In POSIX awk, if a program has just a BEGIN procedure, and nothing else, awk will exit after executing that procedure. The original awk is different; it will execute the BEGIN procedure and then go on to process input, even if there are no pattern-action statements. You can force awk to exit by supplying /dev/null on the command line as a data file argument, or by using exit.

In addition, the BEGIN and END procedures, if present, have to be at the beginning and end of program, respectively. Furthermore, you can only have one of each.

11.1.7 Field Separating

Field separating works the same in old awk as it does in modern awk, except that you can't use regular expressions.

11.1.8 Arrays

There is no way in the original awk to delete an element from an array. The best thing you can do is assign the empty string to the unwanted array element, and then code your program to ignore array elements whose values are empty.

Along the same lines, in is not an operator in original awk; you cannot use if (item in array) to see if an item is present. Unfortunately, this forces you to loop through every item in an array to see if the index you want is present.

for (item in array) {
	if (item == searchkey) {
		process array[item]
		break
	}
}

11.1.9 The getline Function

The original V7 awk did not have getline. If your awk is really ancient, then getline may not work for you. Some vendors have the simplest form of getline, which reads the next record from the regular input stream, and sets $0, NF and NR (there is no FNR, see below). All of the other forms of getline are not available.

11.1.10 Functions

The original awk had only a limited number of built-in string functions. (See Table 11.1 and Table 11.3.)

Table 11.1: Original awk's Built-In String Functions
Awk FunctionDescription
index(s,t)Returns position of substring t in string s or zero if not present.
length(s)

Returns length of string s or length of $0 if no string is supplied.

split(s,a,sep)

Parses string s into elements of array a using field separator sep; returns number of elements. If sep is not supplied, FS is used. Array splitting works the same way as field splitting.

sprintf("fmt",expr)Uses printf format specification for expr.
substr(s,p,n)

Returns substring of string s at beginning position p up to maximum length of n. If n isn't supplied, the rest of the string from p is used.

Some built-in functions can be classified as arithmetic functions. Most of them take a numeric argument and return a numeric value. Table 11.2 summarizes these arithmetic functions.

Table 11.2: Original awk's Built-In Arithmetic Functions
Awk FunctionDescription
exp(x)Returns e to the power x.
int(x)Returns truncated value of x.
log(x)Returns natural logarithm (base-e) of x.
sqrt(x)Returns square root of x.

One of the nicest facilities in awk, the ability to define your own functions, is also not available in original awk.

11.1.11 Built-In Variables

In original awk only the variables shown in Table 11.3 are built in.

Table 11.3: Original awk System Variables
VariableDescription
FILENAMECurrent filename
FSField separator (a blank)
NFNumber of fields in current record
NRNumber of the current record
OFMTOutput format for numbers (%.6g)
OFSOutput field separator (a blank)
ORSOutput record separator (a newline)
RSRecord separator (a newline)

OFMT does double duty, serving as the conversion format for the print statement, as well as for converting numbers to strings.


Previous: 10.9 Invoking awk Using the #! Syntaxsed & awkNext: 11.2 Freely Available awks
10.9 Invoking awk Using the #! SyntaxBook Index11.2 Freely Available awks

The UNIX CD Bookshelf NavigationThe UNIX CD BookshelfUNIX Power ToolsUNIX in a NutshellLearning the vi Editorsed & awkLearning the Korn ShellLearning the UNIX Operating System