Here is a grab bag of useful, if not exactly interesting, sort features. The utility will actually do quite a bit, if you let it.
sort -u sorts the file and eliminates duplicate lines. It's more powerful than uniq (35.20) because:
It sorts the file for you; uniq assumes that the file is already sorted, and won't do you any good if it isn't.
It is much more flexible. sort -u considers lines "unique" if the sort fields (36.2) you've selected match. So the lines don't even have to be (strictly speaking) unique; differences outside of the sort fields are ignored.
In return, there are a few things that uniq does that sort won't do - like print only those lines that aren't repeated, or count the number of times each line is repeated. But on the whole, I find sort -u more useful.
Here's one idea for using sort -u. When I was writing a manual, I often needed to make tables of error messages. The easiest way to do this was to grep the source code for printf statements; write some Emacs (32.1) macros to eliminate junk that I didn't care about; use sort -u to put the messages in order and get rid of duplicates; and write some more Emacs macros to format the error messages into a table. All I had to do was write the descriptions.
One important option (that I've mentioned a number of times) is -b; this tells sort to ignore extra white space at the beginning of each field. This is absolutely essential; otherwise, your sorts will have rather strange results. In my opinion, -b should be the default. But they didn't ask me.
Another thing to remember about -b: it only works if you explicitly specify which fields you want to sort. By itself, sort -b is the same as sort: white space characters are counted. I call this a bug, don't you?
If you don't care about the difference between uppercase and lowercase letters, invoke sort with the -f (case-fold) option. This folds lowercase letters into uppercase. In other words, it treats all letters as uppercase.
The -d option tells sort to ignore all characters except for letters, digits, and white space. In particular, sort -d ignores punctuation.
The -M option tells sort to treat the first three non-blank characters of a field as a three-letter month abbreviation, and to sort accordingly. That is, JAN comes before FEB, which comes before MAR. This option isn't available on all versions of UNIX.
The -r option tells sort to "reverse" the order of the sort; i.e., Z comes before A, 9 comes before 1, and so on. You'll find that this option is really useful. For example, imagine you have a program running in the background that records the number of free blocks in the filesystem at midnight each night. Your log file might look like this:
Jan 1 1992: 108 free blocks Jan 2 1992: 308 free blocks Jan 3 1992: 1232 free blocks Jan 4 1992: 76 free blocks ...
The script below finds the smallest and largest number of free blocks in your log file:
head | #!/bin/sh echo "Minimum free blocks" sort -t: +1nb logfile | head -1 echo "Maximum free blocks" sort -t: +1nbr logfile | head -1 |
---|
It's not profound, but it's an example of what you can do.
-