In the previous four chapters, we have looked at POSIX awk, with only occasional reference to actual awk implementations that you would run. In this chapter, we focus on the different versions of awk that are available, what features they do or do not have, and how you can get them.
First, we'll look at the original V7 version of awk. The original awk lacks many of the features we've described, so this section mostly describes what's not there. Next, we'll look at the three versions whose source code is freely available. All of them have extensions to the POSIX standard. Those that are common to all three versions are discussed first. Finally, we look at three commercial versions of awk.
In each of the sections that follow, we'll take a brief look at how the original awk differs from POSIX awk. Over the years, UNIX vendors have enhanced their versions of original awk; you may need to write small test programs to see exactly what features your old awk has or doesn't have.
The original V7 awk only had "\t", "\n", "\"", and, of course, "\\". Most UNIX vendors have added some or all of "\b" and "\r" and "\f".
Exponentiation (using the ^, ^=, **, and **= operators) is not in old awk.
The three-argument conditional expression found in C, "expr1 ? expr2 : expr3" is not in old awk. You must resort to a plain old if-else statement.
You cannot use the value of a variable as a Boolean pattern.
flag { print "..." }
You must instead use a comparison expression.
flag != 0 { print "..." }
The original awk made it difficult to use patterns dynamically because they had to be fixed when the script was interpreted. You can get around the problem of not being able to use a variable as a regular expression by importing a shell variable inside an awk program. The value of the shell variable will be interpreted by awk as a constant. Here's an example:
$cat awkro2
#! /bin/sh # assign shell's $1 to awk search variable search=$1 awk '$1 ~ /'"$search"'/' acronyms
The first line of the script makes the variable assignment before awk is invoked. To get the shell to expand the variable inside the awk procedure, we enclose it within single, then double, quotation marks.[1] Thus, awk never sees the shell variable and evaluates it as a constant string.
[1] Actually, this is the concatenation of single-quoted text with double-quoted text with more single-quoted text to produce one large quoted string. This trick was used earlier, in Chapter 6, Advanced sed Commands.
Here's another version that makes use of the Bourne shell variable substitution feature. Using this feature gives us an easy way to specify a default value for the variable if, for instance, the user does not supply a command-line argument.
search=$1 awk '$1 ~ /'"${search:-.*}"'/' acronyms
The expression "${search:-.*}" tells the shell to use the value of search if it is defined; if not, use ".*" as the value. Here, ".*" is regular-expression syntax specifying any string of characters; therefore, all entries are printed if no entry is supplied on the command line. Because the whole thing is inside double quotes, the shell does not perform a wildcard expansion on ".*".
In POSIX awk, if a program has just a BEGIN procedure, and nothing else, awk will exit after executing that procedure. The original awk is different; it will execute the BEGIN procedure and then go on to process input, even if there are no pattern-action statements. You can force awk to exit by supplying /dev/null on the command line as a data file argument, or by using exit.
In addition, the BEGIN and END procedures, if present, have to be at the beginning and end of program, respectively. Furthermore, you can only have one of each.
Field separating works the same in old awk as it does in modern awk, except that you can't use regular expressions.
There is no way in the original awk to delete an element from an array. The best thing you can do is assign the empty string to the unwanted array element, and then code your program to ignore array elements whose values are empty.
Along the same lines, in is not an operator in original awk; you cannot use if (item in array) to see if an item is present. Unfortunately, this forces you to loop through every item in an array to see if the index you want is present.
for (item in array) { if (item == searchkey) { process array[item] break } }
The original V7 awk did not have getline. If your awk is really ancient, then getline may not work for you. Some vendors have the simplest form of getline, which reads the next record from the regular input stream, and sets $0, NF and NR (there is no FNR, see below). All of the other forms of getline are not available.
The original awk had only a limited number of built-in string functions. (See Table 11.1 and Table 11.3.)
Awk Function | Description |
---|---|
index(s,t) | Returns position of substring t in string s or zero if not present. |
length(s) | Returns length of string s or length of $0 if no string is supplied. |
split(s,a,sep) | Parses string s into elements of array a using field separator sep; returns number of elements. If sep is not supplied, FS is used. Array splitting works the same way as field splitting. |
sprintf("fmt",expr) | Uses printf format specification for expr. |
substr(s,p,n) | Returns substring of string s at beginning position p up to maximum length of n. If n isn't supplied, the rest of the string from p is used. |
Some built-in functions can be classified as arithmetic functions. Most of them take a numeric argument and return a numeric value. Table 11.2 summarizes these arithmetic functions.
Awk Function | Description |
---|---|
exp(x) | Returns e to the power x. |
int(x) | Returns truncated value of x. |
log(x) | Returns natural logarithm (base-e) of x. |
sqrt(x) | Returns square root of x. |
One of the nicest facilities in awk, the ability to define your own functions, is also not available in original awk.
In original awk only the variables shown in Table 11.3 are built in.
Variable | Description |
---|---|
FILENAME | Current filename |
FS | Field separator (a blank) |
NF | Number of fields in current record |
NR | Number of the current record |
OFMT | Output format for numbers (%.6g) |
OFS | Output field separator (a blank) |
ORS | Output record separator (a newline) |
RS | Record separator (a newline) |
OFMT does double duty, serving as the conversion format for the print statement, as well as for converting numbers to strings.