Sort Command in Linux with Examples
Sorting is the process of arranging records into a specified sequence. Examples of sorting would be arranging a list of usernames into alphabetical order or a set of file sizes into numeric order.
In its simplest form, the sort command will alphabetically sort lines (including any whitespace or control characters which are encountered). The sort command uses the local locale (language definition) to determine the order of the characters (referred to as the collating order). In the following example, user first displays the contents of the file /etc/sysconfig/mouse as is, and then sorts the contents of the file alphabetically.
$ cat /etc/sysconfig/mouse
FULLNAME="Generic - 2 Button Mouse (PS/2)"
MOUSETYPE="ps/2"
XEMU3="yes"
XMOUSETYPE="PS/2"
DEVICE=/dev/psaux
$ sort /etc/sysconfig/mouse
DEVICE=/dev/psaux
FULLNAME="Generic - 2 Button Mouse (PS/2)"
MOUSETYPE="ps/2"
XEMU3="yes"
XMOUSETYPE="PS/2"
If called with arguments, the arguments are interpreted as (possibly multiple) filenames to be sorted. If called without argument, the sort command will sort whatever it reads from standard in.
Modifying the sort order
By default, the sort command sorts lines alphabetically. The following table lists command line switches which can be used to modify this default sort order.
Switch | Effect |
---|---|
-b, –ignore-leading-blanks | Ignore spaces and tabs at the beginning of a line. |
-d, –dictionary-order | Consider only blanks and alphanumeric characters. |
-f, –ignore-case | Treat all characters as uppercase. |
-g, –general-numeric-sort | Compare words as floating point numbers. |
-n, –numeric-sort | Compare words as integers. |
-r, –reverse | Sort in descending rather than ascending order. |
As an example, user is examining the file sizes of all files that start with an m in the /var/log directory.
$ ls -s1 /var/log/m*
20 /var/log/maillog
3104 /var/log/maillog.1
1552 /var/log/maillog.2
1952 /var/log/maillog.3
1236 /var/log/maillog.4
4 /var/log/messages
384 /var/log/messages.1
636 /var/log/messages.2
216 /var/log/messages.3
560 /var/log/messages.4
user next sorts the output with the sort command.
$ ls -s /var/log/m* | sort
1236 /var/log/maillog.4
1552 /var/log/maillog.2
1952 /var/log/maillog.3
20 /var/log/maillog
216 /var/log/messages.3
3104 /var/log/maillog.1
384 /var/log/messages.1
4 /var/log/messages
560 /var/log/messages.4
636 /var/log/messages.2
Without being told otherwise, the sort command sorted the lines alphabetically (with 1952 coming before 20). Realizing this is not what user intended, user adds the -n command line switch.
$ ls -s /var/log/m* | sort -n
4 /var/log/messages
20 /var/log/maillog
216 /var/log/messages.3
384 /var/log/messages.1
560 /var/log/messages.4
636 /var/log/messages.2
1236 /var/log/maillog.4
1552 /var/log/maillog.2
1952 /var/log/maillog.3
3104 /var/log/maillog.1
Better, but user would prefer to reverse the sort order, so that the largest files come first. user adds the -r command line switch.
$ ls -s /var/log/m* | sort -nr
3104 /var/log/maillog.1
1952 /var/log/maillog.3
1552 /var/log/maillog.2
1236 /var/log/maillog.4
636 /var/log/messages.2
560 /var/log/messages.4
384 /var/log/messages.1
216 /var/log/messages.3
20 /var/log/maillog
4 /var/log/messages
Why ls -1?: Why was the -1 command line switch given to the ls command in the first example, but not the others? By default, when the ls command is using a terminal for standard out, it will group the filenames in multiple columns for easy readability. When the ls command is using a pipe or file for standard out, however, it will print the files one file per line. The -1 command line switch forces this behavior for for terminal output as well.
Specifying Sort Keys
In the previous examples, the sort command performed its sort based on the first characters found on a line. Often, formatted data is not arranged so conveniently. Fortunately, the sort command allows users to specify which column of tabular data to use for determining the sort order, or, in more formally, which column should be used as the sort key.
The following table of command line switches can be used to determine the sort key.
Switch | Effect |
---|---|
-k, –key=POS | Use the key at POS to determine sort order. |
-t, –field-separator=SEP | Use the character(s) SEP to separate fields (instead of simply whitespace). |
Sorting Output by a Particular Column
As an example, suppose user wanted to re-examine her log files, using the long format of the ls command. He/user tries to sort the output numerically.
# ls -l /var/log/m* | sort -n
-rw-------. 1 root root 53524 Jun 11 02:37 /var/log/maillog-20201024
-rw-------. 1 root root 0 Oct 24 15:36 /var/log/maillog
-rw-------. 1 root root 3388685 Oct 24 15:35 /var/log/messages-20201024
-rw-------. 1 root root 743976 Oct 30 12:48 /var/log/messages
Now that the sizes are no longer reported at the beginning of the line, user has difficulty. Instead, user repeats his sort using the -k command line switch to sort her output by the 5th column, producing the desired output.
# ls -l /var/log/m* | sort -n -k5
-rw-------. 1 root root 0 Oct 24 15:36 /var/log/maillog
-rw-------. 1 root root 53524 Jun 11 02:37 /var/log/maillog-20201024
-rw-------. 1 root root 744999 Oct 30 12:49 /var/log/messages
-rw-------. 1 root root 3388685 Oct 24 15:35 /var/log/messages-20201024
Specifying Multiple Sort Keys
Next, user is examining the file /etc/services. He/She uses the grep command to extract the data from the file where servicename starts with “a”.
# cat /etc/services | grep ^a
auth 113/tcp authentication tap ident
auth 113/udp authentication tap ident
at-rtmp 201/tcp # AppleTalk routing
at-rtmp 201/udp
at-nbp 202/tcp # AppleTalk name binding
at-nbp 202/udp
at-echo 204/tcp # AppleTalk echo
at-echo 204/udp
at-zis 206/tcp # AppleTalk zone information
at-zis 206/udp
acap 674/tcp
acap 674/udp
afpovertcp 548/tcp # AFP over TCP
afpovertcp 548/udp # AFP over TCP
afs3-fileserver 7000/tcp # file server itself
...
User next sorts the data numerically, using the 1st column as key.
# cat /etc/services | grep ^a | sort -k1
a13-an 3125/tcp # A13-AN Interface
a13-an 3125/udp # A13-AN Interface
a14 3597/tcp # A14 (AN-to-SC/MM)
a14 3597/udp # A14 (AN-to-SC/MM)
a15 3598/tcp # A15 (AN-to-AN)
a15 3598/udp # A15 (AN-to-AN)
a16-an-an 4598/tcp # A16 (AN-AN)
a16-an-an 4598/udp # A16 (AN-AN)
....
Specifying the Field Separator
The above examples have demonstrated how to sort data using a specified field as the sort key. In all of the examples, fields were separated by whitespace (i.e., a series of spaces and/or tabs). Often in Linux (and Unix), some other method is used to separate fields. Consider, for example, the /etc/passwd file.
# head /etc/passwd
root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
adm:x:3:4:adm:/var/adm:/sbin/nologin
lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin
sync:x:5:0:sync:/sbin:/bin/sync
shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown
halt:x:7:0:halt:/sbin:/sbin/halt
mail:x:8:12:mail:/var/spool/mail:/sbin/nologin
operator:x:11:0:operator:/root:/sbin/nologin
The lines are structured into seven fields each, but the fields are separated using a “:” instead of whitespace. With the -t command line switch, the sort command can be instructed to use some specified character (such as a “:”) to separate fields.
In the following, user uses the sort command with the -t command line switch to sort the first 10 lines of the /etc/passwd file by home directory (the 6th field).
# head /etc/passwd | sort -t: -k6
bin:x:1:1:bin:/bin:/sbin/nologin
root:x:0:0:root:/root:/bin/bash
operator:x:11:0:operator:/root:/sbin/nologin
sync:x:5:0:sync:/sbin:/bin/sync
halt:x:7:0:halt:/sbin:/sbin/halt
daemon:x:2:2:daemon:/sbin:/sbin/nologin
shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown
adm:x:3:4:adm:/var/adm:/sbin/nologin
lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin
mail:x:8:12:mail:/var/spool/mail:/sbin/nologin
The user bin, with a home directory of /bin, is now at the top, and the user mail, with a home directory of /var/spool/mail, is at the bottom.
Summary
In summary, we have seen that the sort command can be used to sort structured data, using the -k command line switch to specify the sort field (perhaps more than once), and the -t command line switch to specify the field delimiter.
The -k command line switch can receive more sophisticated arguments, which serve to specify character positions within a field, or customize sort options for individual fields. See the sort(1) man page for details.