Featured Articles: grep: A Pattern Searching Utility

�

[ Current Issue Home | Issue #5/6 Home | FAQ ]��

�

Featured Articles: grep

## A pattern searching utility
## Chuck Rouillard <>

grep is a utility used to search for a word or pattern in a stream of characters such as a file. When using grep to search text files, for example, any matches to the patterns you specify are printed as a complete line in which the text was found, in the file which the search occurred.

Suppose for example, you have a file named mouse.txt which contained:

	I got the idea for the mouse while attending a talk
	at a computer conference.  The speaker was so boring
	that I started daydreaming and hit upon the idea
					    - Doug Engelbart

To search for instances of 'mouse', you could type:

	% grep mouse mouse.txt

which would result in the first line:

	I got the idea for the mouse while attending a talk

Or, searching for instances of 'the':

	% grep the mouse.txt

would result in the first and third lines:

	I got the idea for the mouse while attending a talk
	that I started daydreaming and hit upon the idea

but not the second line, which contains the word 'The'. This is because grep is case sensitive by default. If we wanted to find all instances of the word 'the' without regard for letter case, we could do:

	% grep -i the mouse.txt

which would result in:

	I got the idea for the mouse while attending a talk
	at a computer conference.  The speaker was so boring
	that I started daydreaming and hit upon the idea

grep is also useful in searching through multiple files. Taking the last example further, lets assume you have the two files popular.txt and clark.txt, in addition to mouse.txt, in the current directory. To illustrate:

	% cat popular.txt

produces:

	Where...the ENIAC is equipped with 18,000 vacuum tubes
	and weighs 30 tons, computers in the future may have 1,000
	vacuum tubes and perhaps weigh just 1-1/2 tons.

				Popular Mechanics, March 1949, p.258

and

	% cat clark.txt

produces:

	There is an old network saying: Bandwidth problems can be
	cured with money.  Latency problems are harder because the
	speed of light is fixed--you can't bribe God.

						David Clark, MIT

Now, assuming you wish to do a case-insensitive search on all three text files for the word 'the', you would do:

	% grep -i the *.txt

Assuming these are the only .txt files in the current directory, your output would appear as:

	clark.txt:There is an old network saying: Bandwidth problems can be
	clark.txt:cured with money.  Latency problems are harder because the
	mouse.txt:I got the idea for the mouse while attending a talk
	mouse.txt:at a computer conference.  The speaker was so boring
	mouse.txt:that I started daydreaming and hit upon the idea
	popular.txt:Where...the ENIAC is equipped with 18,000 vacuum tubes
	popular.txt:and weighs 30 tons, computers in the future may have 1,000

Two things to notice in this result is the filename shown on the right of the : (colon) and the alphabetic order in which they are listed.

If you want the filename displayed when you are searching just one file, use /dev/null as a second file.

Extending the pattern matching capabilities of grep requires the use of special characters known as meta-characters. These meta-characters provide for a more general search pattern and, thus, a more powerful search capability. The following examples introduce some of these special characters along with specific examples based on our previous text files.

Note: You should place single quotes around expressions which contain non-alphabetic characters such as the ones we are about to review.

The . (dot) is similar to the ? symbol meaning "any one character". As one example, suppose we have a file with the words 'too' and 'two' used throughout. The following search:

	% grep 't.o' *

would return all lines in which the words 'too' and 'two' where found, but not lines in which just 'to' was found. Possibly misspellings like 'tao' and 'txo' would also appear.

The * (star) symbol means zero or more of the previous character. Here, our previous search might appear as:

	% grep 't*o' *

which would result in all lines containing 'to' and 'too', as well as 'tooo' and 'toooo...', etc.

By combining these two operators as in the following search:

	% grep 't.*o' *

the results would be all lines containing the words 'to', 'too', and 'two', as well as possible misspellings like 'tooo' and 'tao'.

If you want to specify a group of characters to search for, the [ (open bracket) and ] (closed bracket) are used, or, in conjunction with the ^ symbol to exclude a group of characters. For example:

	% grep '[to]' *

would print all the lines where either a 't' or an 'o'(or both) were found--in any order--such as 'to', 'stop', 'tee' or 'open'.

Conversely, if we wished to exclude words containing 't's or 'o's:

	% grep '[^to]' *

would print all the lines not containing either letter.

If you are looking for words within a contiguous range of letters such as 'uvwxyz', you could specify:

	% grep '[uvwxyz]' *

which would find all lines of text containing any number of each letter. Equivalently,

	% grep '[u-z]' *

would return the same results.

Note that numbers and letters are treated similarly. For example, a search of a range of numbers:

	% grep '[012345]' *

	% grep '[0-5]' *

will return all lines containing any numbers with 0, 1, 2, 3, 4, or 5.

grep also supports searches for patterns at the beginning or end of a text line. So, searching through files for lines that begin with 'To', would appear as:

	% grep '^To' *

where searching through files for lines that end with 'to' or 'To', would appear as:

	% grep -i 'to$' *

Note that these are searches at the beginning and end of a text line, and not necessarily a sentence. A text line terminates with a newline character where a sentence may or may not.

Given the use of meta-characters by the grep utility, it is necessary to place quotes around such characters when it is those types of characters you are searching for. In other words, if your search pattern contains one of the meta-characters used by grep, you must use the \ (backslash) to negate it's special meaning.

As an example, the search for '1.' would appear as:

	% grep '1\.' *

to show all number one's immediately proceeded by a period. This also applies the \ character itself which should be searched for as \\.

Hopefully this has explained grep in an easy to understand manner. If you have any comments or questions about this article, you can reach me at the email address at the top of this article.

- Chuck

Return to Issue #5

�

Contact: <>
Last modified: $Date: 1999/06/26 05:42:51 $