CIS 2.55 - Notes 0011

Last notes discussed how the request/response nature of HTTP works, and how parameters are encoded as part of the request. In these notes, we will learn how to write Perl code that decodes these parameters.

Note that all examples here are for Windows. To change them for any other operating system, modify the first line of each script to point to your installation of Perl. There is also some mention of Apache, but these scripts should work on ANY compatible web-server.

Introduction; Basic CGI

Let us examine the simplest case, the plain CGI page that doesn't accept any parameters, and simply displays a "Hello World" on the screen.

#!c:/Perl/bin/Perl.exe

print << "EOF";
Content-type: text/html

<html>
<head><title>Hello World!</title></head>
<body>
    <h3>Hello World!</h3>
</body>
</html>
EOF

We name this file hello.pl and place it in the C:\Apache2\cgi-bin directory (or wherever your web-server is located, and wherever it keeps CGI programs).

With Apache running, all we have to do now is point out browser to: http://localhost/cgi-bin/hello.pl, and magically, we see Hello World! on the screen.

In the above code, notice how we setup Perl to write to output everything we just type into the code. Also notice that we specified the Content-type: text/html line and then skipped a space to indicate to the web-browser that right now follows the content of the response.

Manually Accepting Form Data (`GET`)

We shall now accept form data using a GET request. For that, we first need to construct a plain html page with the form:

<HTML>
<HEAD><TITLE>Signup!</TITLE></HEAD>
<BODY>

<FORM ACTION="/cgi-bin/signup-get.pl" METHOD="GET">
    <TABLE BORDER="0">
        <TR>
            <TD>Name:</TD>
            <TD><INPUT TYPE="TEXT" SIZE="30" NAME="name"></TD>
        </TR>
        <TR>
            <TD>E-Mail:</TD>
            <TD><INPUT TYPE="TEXT" SIZE="30" NAME="email"></TD>
        </TR>
        <TR>
            <TD COLSPAN="2" ALIGN="RIGHT">
                <INPUT TYPE="SUBMIT" VALUE="Signup!">
            </TD>
        </TR>
</FORM>

</BODY>
</HTML>

The page above has a form with a name and e-mail fields, and a "signup" button. This is similar to nearly all forms out there. We name this file as signup.html and place it in the C:\Apache2\htdocs directory. We can now point our browser to: http://localhost/signup.html to see it.

The next thing we need is the signup-get.pl CGI file that will accept the form data. We start by writing a simple CGI script that just displays the data:

#!c:/Perl/bin/Perl.exe

print << "EOF";
Content-type: text/html

<html>
<head><title>Hello World!</title></head>
<body>
    <h3>Submited Data: $ENV{QUERY_STRING}</h3>
</body>
</html>
EOF

When we type: "John Doe" in the name field of the form, and "johndoe at NO SPAM yahoo dot com" in the e-mail field of the form, and click the submit button, the resulting page (this above CGI code) displays:

Submited Data: name=John+Doe&email=johndoe at NO SPAM yahoo dot com

The value of $ENV{QUERY_STRING} is name=John+Doe&email=johndoe at NO SPAM yahoo dot com. Notice that it has replaced the space with the +; this is what is know as URL encoded data. All form data has to go through this conversion.

You should also notice that if we split the line on & we will end up with a string name=John+Doe and a string email=johndoe at NO SPAM yahoo dot com. These, we can further split on the = to get what are called name-value pairs. Here's the whole process in code:

#!c:/Perl/bin/Perl.exe

if($ENV{REQUEST_METHOD} eq "GET"){
   foreach(split /&/,$ENV{QUERY_STRING}){
      ($name,$value) = split /=/;
      $name =~ tr/+/ /;
      $name =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C",hex($1))/ge;
      $value =~ tr/+/ /;
      $value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C",hex($1))/ge;
      $params{$name} = $value;
   }
}

print << "EOF";
Content-type: text/html

<html>
<head><title>Thank you</title></head>
<body>
<h3>Thank you for submitting the form!</h3>

<h4>You are now registered $params{'name'} using
e-mail $params{'email'}</h4>

</body>
</html>
EOF

Now if we do the form submission, we get a string saying that we are signed up as "John Doe" using e-mail "johndoe at NO SPAM yahoo dot com". Cool heh?

In the code, notice that we first check to ensure that this is a GET request, we then proceed to split (and loop on) the QUERY_STRING environment variable.

You should also notice that on these past few requests the URL of the CGI page turned into: http://localhost/cgi-bin/signup-get.pl?name=John+Doe&email=johndoe at NO SPAM yahoo dot com. This is because we are doing a GET request, and the form data becomes encoded as part of the URL.

Now, obviously instead of just displaying a message saying "you are now registered as..." we could actually write the code that does the registration. It is VERY common to find database code inside CGI scripts (otherwise what's the point of having CGI?). We will discuss databases in later notes.

Manually Accepting Form Data (`POST`)

We will now discuss POST requests. If you remember, in a GET request, data is encoded as part of the URL. This has many limitations. For one, some browsers limit the size of the URL to 1k (or more recently to 4k), another limitation is that some form data is confidential - like the passwords, etc.) You don't really want to flash your password in clear text as part of the URL (which may be cached for other users of that computer). What you want is a way for you to directly send information from your form to the server. This is where POST request comes into play.

The form is identical to our previous one, except we change the GET line to POST (along with a different URL to another CGI script):

<HTML>
<HEAD><TITLE>Signup!</TITLE></HEAD>
<BODY>

<FORM ACTION="/cgi-bin/signup-post.pl" METHOD="POST">
    <TABLE BORDER="0">
        <TR>
            <TD>Name:</TD>
            <TD><INPUT TYPE="TEXT" SIZE="30" NAME="name"></TD>
        </TR>
        <TR>
            <TD>E-Mail:</TD>
            <TD><INPUT TYPE="TEXT" SIZE="30" NAME="email"></TD>
        </TR>
        <TR>
            <TD COLSPAN="2" ALIGN="RIGHT">
                <INPUT TYPE="SUBMIT" VALUE="Signup!">
            </TD>
        </TR>
</FORM>

</BODY>
</HTML>

Just as before, what we need now is the signup-post.pl script to accept the POST data:

#!c:/Perl/bin/Perl.exe

if($ENV{REQUEST_METHOD} eq "POST"){
   read STDIN,$b,$ENV{CONTENT_LENGTH};
   foreach(split /&/,$b){
      ($name,$value) = split /=/;
      $name =~ tr/+/ /;
      $name =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C",hex($1))/ge;
      $value =~ tr/+/ /;
      $value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C",hex($1))/ge;
      $params{$name} = $value;
   }
}

print << "EOF";
Content-type: text/html

<html>
<head><title>Thank you</title></head>
<body>
<h3>Thank you for submitting the form!</h3>

<h4>You are now registered $params{'name'} using
e-mail $params{'email'}</h4>

</body>
</html>
EOF

Notice that in this version, we check for the request POST and then read CONTENT_LENGTH bytes from STDIN (standard input). What we are reading is in exactly the same format at QUERY_STRING in the previous examples (that's why we can use the same exact parsing techniques to get at the data).

The output of this code is identical to the GET request version, so I won't post it again here. The only thing that you should notice from this example is that the data is no longer part of the URL. In fact, all you see is: http://localhost/cgi-bin/signup-post.pl and that's it.

Use `POST` as opposed to `GET`

POST method is ideal whenever you're submitting forms. Always try to use POST and only resort to using GET whenever POST is inconvenient.

GET comes in useful in situation where you only have a link. For example, you can specify parameters as part of the URL itself. For example:

<a href="/cgi-bin/somecgi.pl?id=23442&p=2345&q=2423">some link here</a>

Where the id=23442&p=2345&q=2423 will be sent as part of the GET request whenever someone clicks on that link.

Easier Form Data???

Is there an easier way to get the form data besides having to remember how to parse it every time? Well, there is. Just as ASP, PHP, JSP, etc., parse form data for you, you can do the same in Perl. Here's an example that does the same as our previous examples:

#!c:/Perl/bin/Perl.exe

use CGI;

$q = new CGI;

$name = $q->param('name');
$email = $q->param('email');

print << "EOF";
Content-type: text/html

<html>
<head><title>Thank you</title></head>
<body>
<h3>Thank you for submitting the form!</h3>

<h4>You are now registered $name using e-mail $email</h4>

</body>
</html>
EOF

Notice that it only takes very few lines for us to get parameters:

use CGI;

$q = new CGI;

$name = $q->param('name');
$email = $q->param('email');

Also notice that we never specify which type of request parameters we are interested in (either GET or POST). That is because the CGI library determines which request it is, and uses a parsing method appropriate to that request. From the programming perspective, all we care about is getting parameters, and that's what it gives us. Simple?

Heh? Why bother learn manual method?

Why did we bother to learn the manual method when such a convenient and easy way of getting parameters exists? Well, just as with every convenient method, there are limitations. We cannot parse other kinds of data that may come as part of the request.

For example, it works great for CGI pages, etc., but if we wanted to create a Web Service that would accept XML SOAP content as part of the request, we would need to parse that ourselves. Also, it is extremely useful to know how things work on the inside when you use them.

(ie: instead of relying on the library, you can do it yourself. You also now know how ASP, JSP, PHP, etc., get their parameters.)

Well, that's about it for basic CGI. You can learn many of the more advanced features from many books. I especially recommend Oreilly's CGI Programming with Perl.

© 2006, Particle

Notes 0011

CGI Continued

Introduction; Basic CGI

Manually Accepting Form Data (GET)

Manually Accepting Form Data (POST)

Use POST as opposed to GET

Easier Form Data???

Heh? Why bother learn manual method?

Manually Accepting Form Data (`GET`)

Manually Accepting Form Data (`POST`)

Use `POST` as opposed to `GET`