## A pattern searching utility
## Chuck Rouillard <>
grep is a utility used to search for a word or pattern in a
stream of characters such as a file. When using grep to search text files,
for example, any matches to the patterns you specify are printed as a complete
line in which the text was found, in the file which the search occurred.
Suppose for example, you have a file named mouse.txt which contained:
I got the idea for the mouse while attending a talk
at a computer conference. The speaker was so boring
that I started daydreaming and hit upon the idea
- Doug Engelbart
To search for instances of 'mouse', you could type:
% grep mouse mouse.txt
which would result in the first line:
I got the idea for the mouse while attending a talk
Or, searching for instances of 'the':
% grep the mouse.txt
would result in the first and third lines:
I got the idea for the mouse while attending a talk
that I started daydreaming and hit upon the idea
but not the second line, which contains the word 'The'. This is
because grep is case sensitive by default. If we wanted to find all
instances of the word 'the' without regard for letter case, we could
do:
% grep -i the mouse.txt
which would result in:
I got the idea for the mouse while attending a talk
at a computer conference. The speaker was so boring
that I started daydreaming and hit upon the idea
grep is also useful in searching through multiple files. Taking the
last example further, lets assume you have the two files popular.txt and
clark.txt, in addition to mouse.txt, in the current directory. To
illustrate:
% cat popular.txt
produces:
Where...the ENIAC is equipped with 18,000 vacuum tubes
and weighs 30 tons, computers in the future may have 1,000
vacuum tubes and perhaps weigh just 1-1/2 tons.
Popular Mechanics, March 1949, p.258
and
% cat clark.txt
produces:
There is an old network saying: Bandwidth problems can be
cured with money. Latency problems are harder because the
speed of light is fixed--you can't bribe God.
David Clark, MIT
Now, assuming you wish to do a case-insensitive search on all three
text files for the word 'the', you would do:
% grep -i the *.txt
Assuming these are the only .txt files in the current directory, your
output would appear as:
clark.txt:There is an old network saying: Bandwidth problems can be
clark.txt:cured with money. Latency problems are harder because the
mouse.txt:I got the idea for the mouse while attending a talk
mouse.txt:at a computer conference. The speaker was so boring
mouse.txt:that I started daydreaming and hit upon the idea
popular.txt:Where...the ENIAC is equipped with 18,000 vacuum tubes
popular.txt:and weighs 30 tons, computers in the future may have 1,000
Two things to notice in this result is the filename shown on the right
of the : (colon) and the alphabetic order in which they are
listed.
If you want the filename displayed when you are searching just one
file, use /dev/null as a second file.
Extending the pattern matching capabilities of grep requires
the use of special characters known as meta-characters. These meta-characters
provide for a more general search pattern and, thus, a more powerful
search capability. The following examples introduce some of these
special characters along with specific examples based on our previous
text files.
Note: You should place single quotes around expressions
which contain non-alphabetic characters such as the ones we are about to
review.
The . (dot) is similar to the ? symbol meaning
"any one character". As one example, suppose we have a file with the words
'too' and 'two' used throughout. The following search:
% grep 't.o' *
would return all lines in which the words 'too' and 'two' where found,
but not lines in which just 'to' was found. Possibly misspellings like
'tao' and 'txo' would also appear.
The * (star) symbol means zero or more of the previous
character. Here, our previous search might appear as:
% grep 't*o' *
which would result in all lines containing 'to' and 'too', as well as
'tooo' and 'toooo...', etc.
By combining these two operators as in the following search:
% grep 't.*o' *
the results would be all lines containing the words 'to', 'too', and
'two', as well as possible misspellings like 'tooo' and 'tao'.
If you want to specify a group of characters to search for, the
[ (open bracket) and ] (closed bracket) are used,
or, in conjunction with the ^ symbol to exclude a group of
characters. For example:
% grep '[to]' *
would print all the lines where either a 't' or an 'o'(or both) were
found--in any order--such as 'to', 'stop', 'tee' or 'open'.
Conversely, if we wished to exclude words containing 't's or 'o's:
% grep '[^to]' *
would print all the lines not containing either letter.
If you are looking for words within a contiguous range of letters such
as 'uvwxyz', you could specify:
% grep '[uvwxyz]' *
which would find all lines of text containing any number of each
letter. Equivalently,
% grep '[u-z]' *
would return the same results.
Note that numbers and letters are treated similarly. For example, a
search of a range of numbers:
% grep '[012345]' *
or
% grep '[0-5]' *
will return all lines containing any numbers with 0, 1, 2, 3, 4, or 5.
grep also supports searches for patterns at the beginning or
end of a text line. So, searching through files for lines that begin with
'To', would appear as:
% grep '^To' *
where searching through files for lines that end with 'to' or 'To',
would appear as:
% grep -i 'to$' *
Note that these are searches at the beginning and end of a text line,
and not necessarily a sentence. A text line terminates with a newline
character where a sentence may or may not.
Given the use of meta-characters by the grep utility, it is necessary
to place quotes around such characters when it is those types of
characters you are searching for. In other words, if your search
pattern contains one of the meta-characters used by grep, you
must use the \ (backslash) to negate it's special meaning.
As an example, the search for '1.' would appear as:
% grep '1\.' *
to show all number one's immediately proceeded by a period. This also
applies the \ character itself which should be searched for as
\\.
Hopefully this has explained grep in an easy to understand manner. If
you have any comments or questions about this article, you can reach
me at the email address at the top of this article.
- Chuck
Return to Issue #5
|