[Chapter 36] 36.6 Miscellaneous sort Hints

36.6 Miscellaneous sort Hints

Here is a grab bag of useful, if not exactly interesting, sort features. The utility will actually do quite a bit, if you let it.

36.6.1 Dealing with Repeated Lines

sort -u sorts the file and eliminates duplicate lines. It's more powerful than uniq (35.20) because:

It sorts the file for you; uniq assumes that the file is already sorted, and won't do you any good if it isn't.
It is much more flexible. sort -u considers lines "unique" if the sort fields (36.2) you've selected match. So the lines don't even have to be (strictly speaking) unique; differences outside of the sort fields are ignored.

In return, there are a few things that uniq does that sort won't do - like print only those lines that aren't repeated, or count the number of times each line is repeated. But on the whole, I find sort -u more useful.

Here's one idea for using sort -u. When I was writing a manual, I often needed to make tables of error messages. The easiest way to do this was to grep the source code for printf statements; write some Emacs (32.1) macros to eliminate junk that I didn't care about; use sort -u to put the messages in order and get rid of duplicates; and write some more Emacs macros to format the error messages into a table. All I had to do was write the descriptions.

36.6.2 Ignoring Blanks

One important option (that I've mentioned a number of times) is -b; this tells sort to ignore extra white space at the beginning of each field. This is absolutely essential; otherwise, your sorts will have rather strange results. In my opinion, -b should be the default. But they didn't ask me.

Another thing to remember about -b: it only works if you explicitly specify which fields you want to sort. By itself, sort -b is the same as sort: white space characters are counted. I call this a bug, don't you?

36.6.3 Case-Insensitive Sorts

If you don't care about the difference between uppercase and lowercase letters, invoke sort with the -f (case-fold) option. This folds lowercase letters into uppercase. In other words, it treats all letters as uppercase.

36.6.4 Dictionary Order

The -d option tells sort to ignore all characters except for letters, digits, and white space. In particular, sort -d ignores punctuation.

36.6.5 Month Order

The -M option tells sort to treat the first three non-blank characters of a field as a three-letter month abbreviation, and to sort accordingly. That is, JAN comes before FEB, which comes before MAR. This option isn't available on all versions of UNIX.

36.6.6 Reverse Sort

The -r option tells sort to "reverse" the order of the sort; i.e., Z comes before A, 9 comes before 1, and so on. You'll find that this option is really useful. For example, imagine you have a program running in the background that records the number of free blocks in the filesystem at midnight each night. Your log file might look like this:

Jan 1 1992:  108 free blocks
Jan 2 1992:  308 free blocks
Jan 3 1992: 1232 free blocks
Jan 4 1992:   76 free blocks
...

The script below finds the smallest and largest number of free blocks in your log file:

head
#!/bin/sh echo "Minimum free blocks" sort -t: +1nb logfile | head -1 echo "Maximum free blocks" sort -t: +1nbr logfile | head -1

head	#!/bin/sh echo "Minimum free blocks" sort -t: +1nb logfile \| head -1 echo "Maximum free blocks" sort -t: +1nbr logfile \| head -1

It's not profound, but it's an example of what you can do.

- ML


36.5 Alphabetic and Numeric Sorting		36.7 Sorting Multiline Entries