How to use grep command in Linux (Along with Regular Expressions)
The grep Command
Like most UNIX commands, grep is a mnemonic. The grep mnemonic is derived from ex editor commands. The meaning is globally (g) search for a regular expression (re) and print (p) the results (grep). The grep utility searches text files for a specified pattern and prints all lines that contain that pattern. If no files are specified, grep assumes it will receive a text from standard input.
Consider the following scenario. A user comes to you and says that the msxyz program on her machine has locked up her machine. She cannot get the program to halt. It is your job to find the process and stop it. The ‘ps -ef’ command gives you a long list of running processes. The output is probably too long to find this user’s process. You want a single line or a specific process list.
$ ps -ef | grep 'msxyz'
The previous grep command takes the ‘ps -ef’ command output as its input. The grep command performs a search for the string msxyz and prints the results. This gives you specific lines to review. The grep command syntax is:
grep [options] pattern_file
You have multiple terminal and console windows open. To see those specific processes, execute a ps command and search for the dtterm command.
$ ps -e | grep 'dtterm'
352 ?? 0:00 dtterm
353 ?? 0:13 dtterm
354 ?? 0:11 dtterm
1766 pts/5 0:00 dtterm
The grep Options
The table below shows grep command options. These options modify grep behavior.
Option | Meaning |
---|---|
-i | Makes the command case insensitive |
-c | Prints the count of lines that match |
-l | Prints the names of the files in which the lines match |
-v | Prints the lines that do not contain the search pattern |
-n | Prints the line numbers |
$ grep -i 'the' /etc/default/login
# Set the TZ environment variable of the shell.
# ULIMIT sets the file size limit for the login. Units are disk blocks.
# The default of zero means no limit.
# ALTSHELL determines if the SHELL environment variable should be set
# PATH sets the initial shell PATH variable
Compare this to the output of the following command, which prints only the lines of text that match the pattern “The”.
$ grep 'The' /etc/default/login
# The default of zero means no limit.
# bad password is provided. The range is limited from
# The SYSLOG_FAILED_LOGINS variable is used to determine how many failed
The -c option counts the number of lines that match the pattern. It then prints the count and not the actual lines that matched the pattern.
$ grep -ci 'the' /etc/default/login
19
$ grep -c 'The' /etc/default/login
3
Use the -l option to:
- Search for a string in many files.
- Have the output list the only files in which the string is found.
The -l option is often useful when you want to feed the output of grep to another utility to process a list.
# grep -l 'grep' /etc/init.d/*
/etc/init.d/apache
/etc/init.d/cachefs.daemon
/etc/init.d/dhcp
/etc/init.d/dodatadm.udaplt
/etc/init.d/dtlogin
/etc/init.d/imq
/etc/init.d/init.wbem
/etc/init.d/ncakmod
/etc/init.d/swupboots
To find a search pattern in a large file, print the line number before each match using the -n option. This is useful when you are editing files:
# grep -n 'user' /etc/passwd
18:user1:x:100:10::/export/home/user1:/bin/sh
19:user2:x:101:10::/export/home/user2:/bin/ksh
The -v option prints lines that do not contain the search pattern.
# grep -v 'root' /etc/group
staff::10:
sysadmin::14:
smmsp::25:
gdm::50:
webservd::80:
postgres::90:
nobody::60001:
noaccess::60002: nogroup::65534:
Regular Expression Metacharacters
A regular expression (RE) is a character pattern that matches the same characters in a search. Regular expressions:
- Allow you to specify patterns to search in text.
- Provide a powerful way to search files for specific pattern occurrences.
- Give additional meaning to patterns (as shown in Table4-2).
When you use regular expression characters with the grep command, enter quotes around the pattern. Some regular expression characters used by grep are also metacharacters to one or more shells, and a shell might use a metacharacter as a file name metacharacter. Use single (’) quotes. Doing this hides more metacharacters from a shell.
The table below grep command Metacharacters:
Metacharacter | Function |
---|---|
\ | Escapes the special meaning of an RE character |
^ | Matches the beginning of the line |
$ | Matches the end of the line |
\< | Matches the beginning of word anchor |
\> | Matches the end of word anchor |
[] | Matches any one character from the specified set |
[-] | Matches any one character in the specified range |
* | Matches zero or more of the preceding character |
. | Matches any single character |
\{ \} | Specifies the minimum and maximum number of matches for a regular expression |
Regular Expressions
Using a regular expression, you can search the current process table (and header) for any process that contains a capital letter. Do this by using the following range as the pattern to the grep command:
# ps -ef | grep '[A-Z]'
UID PID PPID C STIME TTY TIME CMD
root 647 1 0 06:14:45 ? 0:00 /usr/lib/dmi/snmpXdmid -s sls-s10
host
noaccess 797 1 0 06:15:03 ? 1:34 /usr/java/bin/java -server -Xmx128m
XX:+UseParallelGC -XX:ParallelGCThreads=4
root 708 704 4 06:14:50 ? 5:22 /usr/X11/bin/Xorg :0 -depth 24
nobanner -auth /var/dt/A:0-9Aaayb
root 813 739 0 06:15:16 ? 0:00 /bin/ksh /usr/dt/bin/Xsession
root 905 903 0 06:15:27 pts/2 0:00 -sh -c unset DT; DISPLAY=:0;
/usr/dt/bin/dtsession_res -merge
root 1045 1 1 06:15:51 ? 1:10 /usr/lib/mixer_applet2 --oaf
activate-iid=OAFIID:GNOME_MixerApplet_Factory --oa
root 1050 1 0 06:15:52 ? 0:01 /usr/lib/notification-area-applet -oaf-activate-iid=OAFIID:GNOME_NotificationA
root 1440 1284 0 08:20:35 pts/4 0:00 grep [A-Z]
If you are only interested in current processes that contain the capital letter A in the line, limit the pattern to specify that character.
# ps -ef | grep 'A'
root 708 704 7 06:14:50 ? 5:26 /usr/X11/bin/Xorg :0 -depth 24
nobanner -auth /var/dt/A:0-9Aaayb
root 905 903 0 06:15:27 pts/2 0:00 -sh -c unset DT; DISPLAY=:0;
/usr/dt/bin/dtsession_res -merge
root 1442 1284 0 08:21:17 pts/4 0:00 grep A
Escaping a Regular Expression
To escape a regular expression, use a \ (backslash) followed by a single character matches that character. Thus, a \$ matches a dollar sign and a \. matches a period. Doing this divests a metacharacter of its special meaning. The following example shows the $ as a regular expression character that matches the end of a line.
# grep '$' /etc/init.d/nfs.server
#!/sbin/sh
case "$1" in
'start')
svcadm enable -t network/nfs/server
;;
'stop')
svcadm disable -t network/nfs/server
;;
*) echo "Usage: $0 { start | stop }"
exit 1
;;
esac
The output contains all the lines from the script because the $ matches the end-of-line character for each line in the script. Verify this with the wc command.
$ grep '$' /etc/init.d/nfs.server | wc -l
24
$ wc -l /etc/init.d/nfs.server
24 /etc/init.d/nfs.server
To display only the lines from the nfs.server boot script that contain the literal character $, hide its special meaning by preceding the character with the \ regular expression character.
$ grep '\$' /etc/init.d/nfs.server
case "$1" in
echo "Usage: $0 { start | stop }"
$ grep '\$' /etc/init.d/nfs.server | wc -l
2
Line Anchors
An anchor is a symbol that matches a character position on a line. The ^ and $ anchors match text patterns relative to the beginning ^ or ending $ of a line of text. For example, the following command finds all lines that contain the pattern root in the /etc/group file:
$ grep 'root' /etc/group
root::0:
other::1:root
bin::2:root,daemon
sys::3:root,bin,adm
adm::4:root,daemon
uucp::5:root
If you intend to display only the one entry for the root group in the /etc/group file, then the pattern must specify that the line begins with the pattern (given the syntax of the file).
$ grep '^root' /etc/group
root::0:
The regular expression character allows you to anchor the pattern match to the beginning of the line. Similarly, the $ regular expression character allows you to anchor the pattern match to the end of the line. Lines print only if the specified pattern represents the characters preceding the end-of-line character.
$ grep 'mount$' /etc/vfstab
#device device mount FS fsck mount mount
Word Anchors
A backslash used with an angle bracket is a word anchor. The less-than bracket () marks the end of a word. Text that precedes this bracket is matched only when it occurs at the end of a word. Words are delimited by spaces, tabs, beginnings of line, ends of line, and punctuation. For example, if you wanted to print the group file entry for the uucp group, issuing the grep command without regular expression characters gives you the uucp and nuucp group entries. Using the following command, however, should give only the single group entry for uucp.
$ grep '\<uucp' /etc/group
uucp::5:root
Use both word anchors at the same time to ensure your pattern is a complete word by itself, rather than a sub-string of another word. Note the output if you search for the pattern user in the /etc/passwd file.
$ grep 'user' /etc/passwd
user:x:100:1::/home/user:/bin/sh
user2:x:101:1::/home/user2:/bin/sh
user3:x:102:1::/home/user3:/bin/sh
The preceding output includes lines with user as a sub-string of words, such as user2, and user3. If you were searching for the specific user named user, you should use both word anchors (or the -w option).
$ grep '\' /etc/passwd
user:x:100:1::/home/user:/bin/sh
Character Classes
A string enclosed in square brackets specifies a character class. Any single character in the string is matched. For example, the grep ‘[abc]’ frisbee command displays every line that contains an a, b, or c in the frisbee file. The following command prints the lines from the /etc/group file that contain either the letter i or the letter u.
$ grep '[iu]' /etc/group
bin::2:root,daemon
sys::3:root,bin,
adm uucp::5:root
mail::6:root
nuucp::9:root
sysadmin::14:
nogroup::65534:
You might also specify a range of characters, which results in printing lines that contain at least one of the specified characters in the range.
$ grep '[u-y]' /etc/group
sys::3:root,bin,adm
uucp::5:root
tty::7:root,
adm nuucp::9:root
sysadmin::14:
webservd::80:
nobody::60001:
nogroup::65534:
The following examples show the contents of the teams file and how character classes can be used to find the word the or The in any line in the teams file:
$ cat teams Team
one consists of
Tom
Team two consists of
Fred
The teams are chosen randomly.
Tea for two and Dom
Tea for two and Tom
$ grep '\' teams
The teams are chosen randomly.
Line Anchors
An anchor is a symbol that matches a character position on a line. The ^ and $ anchors match text patterns relative to the beginning ^ or ending $ of a line of text. For example, the following command finds all lines that contain the pattern root in the /etc/group file:
$ grep 'root' /etc/group
root::0:
other::1:root
bin::2:root,daemon
sys::3:root,bin,adm
adm::4:root,daemon
uucp::5:root
If you intend to display only the one entry for the root group in the /etc/group file, then the pattern must specify that the line begins with the pattern (given the syntax of the file).
$ grep '^root' /etc/group
root::0:
The regular expression character allows you to anchor the pattern match to the beginning of the line. Similarly the $ regular expression character allows you to anchor the pattern match to the end of the line. Lines print only if the specified pattern represents the characters preceding the end-of-line character.
$ grep 'mount$' /etc/vfstab
#device device mount FS fsck mount mount
Word Anchors
A backslash used with an angle bracket is a word anchor. The less-than bracket () marks the end of a word. Text that precedes this bracket is matched only when it occurs at the end of a word. Words are delimited by spaces, tabs, beginnings of line, ends of line, and punctuation. For example, if you wanted to print the group file entry for the uucp group, issuing the grep command without regular expression characters gives you the uucp and nuucp group entries. Using the following command, however, should give only the single group entry for uucp.
$ grep '\<uucp' /etc/group
uucp::5:root
Use both word anchors at the same time to ensure your pattern is a complete word by itself, rather than a sub-string of another word. Note the output if you search for the pattern user in the /etc/passwd file.
$ grep 'user' /etc/passwd
user:x:100:1::/home/user:/bin/sh
user2:x:101:1::/home/user2:/bin/sh
user3:x:102:1::/home/user3:/bin/sh
The preceding output includes lines with user as a sub-string of words, such as user2, and user3. If you were searching for the specific user named user, you should use both word anchors (or the -w option).
$ grep '\' /etc/passwd
user:x:100:1::/home/user:/bin/sh
Character Classes
A string enclosed in square brackets specifies a character class. Any single character in the string is matched. For example, the grep ‘[abc]’ frisbee command displays every line that contains an a, b, or c in the frisbee file. The following command prints the lines from the /etc/group file that contain either the letter i or the letter u.
$ grep '[iu]' /etc/group
bin::2:root,daemon
sys::3:root,bin,
adm uucp::5:root
mail::6:root
nuucp::9:root
sysadmin::14:
nogroup::65534:
You might also specify a range of characters, which results in printing lines that contain at least one of the specified characters in the range.
$ grep '[u-y]' /etc/group
sys::3:root,bin,adm
uucp::5:root
tty::7:root,
adm nuucp::9:root
sysadmin::14:
webservd::80:
nobody::60001:
nogroup::65534:
The following examples show the contents of the teams file and how character classes can be used to find the word the or The in any line in the teams file:
$ cat teams Team
one consists of
Tom
Team two consists of
Fred
The teams are chosen randomly.
Tea for two and Dom
Tea for two and Tom
$ grep '\' teams
The teams are chosen randomly.
Character Match
Single Character Match
The . regular expression character matches any one character except the newline character. The following command looks for all lines containing c, followed by any three characters, followed by h.
$ grep 'c...h' /usr/dict/words
The following command looks for all lines that do not have a character before the c; that is, the c is the first character, followed by any three characters, followed by h.
$ grep '^c...h' /usr/dict/words
The following command looks for all lines that do not have a character before the c, followed by any three characters, followed by h, which is the end of the word; that is five-letter words that begin with c and end with h.
$ grep '^c...h$' /usr/dict/words
Character Match by Specifying a Range
The \{ and \} expressions allow you to specifiy the minimum and maximum number of matches for a regular expression. The following example shows the use of this expression.
$ cat test
root
rooot
roooot
rooooot
$ grep 'ro\{3\}t' test
rooot
$ grep 'ro\{2,4\}t' test
root
rooot
roooot
Closure (*)
The *, when used in a regular expression, is termed a closure. The closure symbol matches the preceding symbol or character zero or more times.
$ grep 'Team*' teams
Team one consists of
Team two consists of
Tea for two and Dom
Tea for two and Tom
For example, to find all lines that contain a word beginning with T and a word ending with m, use the following command:
$ grep '\' teams
Team one consists of
Tom
Team two consists of
Tea for two and Dom
Tea for two and Tom
An asterisk (*) has special meaning only when it follows another character. If it is the first character in a regular expression or if it is by itself, it has no special meaning. The following example searches for lines containing a literal asterisk within the file called teams.
$ grep '*' teams
The asterisk has another meaning outside of the regular expression. The following command searches all files in the current directory for the string abc.
$ grep 'abc' * data1:abcd
In the above example, the * is a shell metacharacter rather than a grep metacharacter.
The egrep Command
The egrep command (or the extended grep command) searches a file for a pattern using full regular expressions. For example:
# grep "two | team" teams
# egrep "two | team" teams
Team two consists of
The teams are chosen randomly.
Tea for two and Dom
Tea for two and Tom