CIS 2.55 - Homeworks

HW1: Read Chapter 1 of Programming Perl. Install Perl (If you need to---ie: if you're using Windows, etc.). Write a 'Hello World' program (submitted online).

HW2: Write a program that prompts a user to enter a bunch of numbers. Your program then proceeds to sort those numbers (you write the sorting code yourself - do not use the `sort' function). Your program then outputs the sorted list of numbers.

HW3: Given a scalar, with Names followed by Age:

$names = "John 20, Bill  34,Jane 28,  Wall 18,Tom  19";

Using Perl, display two lists, one sorted by Name, the other sorted by Age. The output of the program should be something like this:

By Name: Bill  34, Jane 28, John 20, Tom  19, Wall 18,
By Age: Wall 18, Tom  19, John 20, Jane 28, Bill  34,

(Do not correct the spacing in $names scalar. It is like that for a reason.)

As an extra challenge, try to cram as much functionality into as little space as possible. (I was able to write this program in 2 lines.)

HW4: Write a Perl program to exactly compute 1000! (That's 1000 factorial). Refer to the Arithmetics link on the side for an idea of how to do this. This is -much- more complicated than it appears. Don't use the Math::* modules. You just need arrays for this.

HW5: Write a CGI script to display a form asking for the user's name. When the user submits the form, the script displays `<h1>Hello $Name</h1>'. You can test it by installing Apache and Perl on your computer; or other means.

If you feel like it, you might want to implement the Perl script processor, to make your scripts look similar to ASP, or PHP, etc.

HW6: Given a file input.xml, determine if it is a well formatted XML or not (the only output of your program is then "yes" or "no"). Well formatted XML means that each tag has a closing tag, unless it is a single tag. So for example, tag: MUST have a closing tag , but tag does not (since it ends with a />) The tags MUST also be properly nested. For example, <a></a> is properly nested, but <a></a> is not. For this question, you can assume that there are no XML prolog, no XML comments, and no tag attributes (you only have tags like <abc>, and no tags like <abc xyz="foo">). Sample input:

<html>
   <head>
       <title>Hi! This is XML!</title>
   </head>
   <body>
       <p>This is a simple well formatted XML page.</p>
       <p>(well, it is html, but we can pretend that it's XML)</p>
   </body>
</html>

To which the answer would be "yes"... an example of invalid input would be:

<html>
   <head>
       <title>Hi! This is XML!</title>
   </head>
   <body>
       <p>This is a simple well formatted XML page.<br>:-)
       <p>(well, it is html, but we can pretend that it's XML)
   </body>
</html>

The things that make it invalid is that is not matched with a closing and that tag doesn't have a closing tag (or is not ).

Do not use any XML libraries for this: you must write the code yourself to parse the file.

Extra (if you feel upto it): Add support for attributes, handle comments, handle the xml's CDATA section, etc. Tip: You can use Google to find more info about this stuff.

HW7: Write a tech stock information grabber program. For stock information, your program will get its data from: http://finance.yahoo.com/

The tech-stock grabber program should display information: issue-name, last-sale-price, total-outstanding-shares-qty, market-capitalization-amt.

It should do that for at least these stock symbols: MSFT, GOOG, ORCL, IBM, SUNW, INTC, AMD.

Tip: You can (but not forced to) use "GET" utility (or "wget" if on UNIX) to download the file. Good luck!

Tip2: Use regular expressions.

Tip3: You can also use the below code to retrieve a page :-)

use IO::Socket;

sub webget {
    my ($url) = @_;
    $url =~ m|http://(.*?)(:\d+)?(/.*)|;
    my ($host,$doc) = ($1,$3);
    my $EOL = "\015\012";
    my $BLANK = $EOL x 2;
    $remote = IO::Socket::INET->new(Proto => "tcp",
        PeerAddr  => $host,
        PeerPort  => "http(80)",
    );
    $remote->autoflush(1);
    print $remote "GET $doc HTTP/1.1" . $EOL;
    print $remote "Host: $host" . $BLANK;
    my $content = join '',<$remote>;
    close $temote;
    return $content;
}

Tip4: If you bother to install the LWP module (ie: run
perl -MCPAN -e"install LWP::Simple"
you can grab a page via:

use LWP::Simple;
my $contents = get("http://www.theparticle.com/");

HW8: Using Perl, write an XML parser. You create an XML parser object, give it a file name, and it gives you a parse tree. (or some error depending whatever the error is).

Note that you are the one who is writing the parser; don't use some other library to parse XML. Use regular expressions, and a stack to parse.

Your test code should traverse the parse tree and display the original file using only data from the parse tree.

You should also test your code for at least these conditions:

No file exists, file not readable, tags don't match, more than one root tag, file is empty, etc.

Note that the tags can have attributes, ie, like:

<person fname="John" lname="Doe">
    <education id="1">
        <school name="Brooklyn College">
           <graduated/>
           <credits>120</credits>
        </school>
    </education>
    <education id="2">
        <school name="School of Something">
           <credits>30</credits>
           <major>CIS</major>
        </school>
    </education>
</person>

HW9: Create an object/module ``Complex.pm'', that will represent a complex number. You should override addition, subtraction, division, multiplication, as well as a tostring function (or something). Also, write a simple program to test your class... implement something like the Fourier Transform (the simple slow version). Using your code, I should be able to say (for other operators besides addition):

use Complex; $a = new Complex(10,14); $b = new Complex(12,34); $c = $a + $b; print "$a + $b = $c\n";

HW10: Implement a Naive Bayes classifier for websites. Have it classify documents into one of two categories (`commerical' or not)---you can have more than those if you want. You feed the program the category and a URL (for `learning'). To classify, you just give it a URL, and it spits out the probabilities of it being in any of the categories. Setup your code to maintain the statistics (not to re-learn everything everytime the program runs).