May 25th, 2024

CISC 3140 3 1165 ET6
Main
Files
Syllabus
Homeworks

Notes
0001

Files
darts.html
20120911.html
DatabaseNotes
SQL Intro
More SQL
Lorenz Attractor
20121002
20121009
20121016

MIDTERM (due 20121113)

datamining.pdf

probmining.pdf

matmining.pdf

ML applets

## Homeworks

HW1: Send me an email with your name.

HW2: Do `Sample Questions' at the end of: sql2.pdf; For the same database, also answer the following questions:

1) Find the company with most employees.

2) Find employees who make more than the average salary within their company.

3) Find employees who make more than the median salary within their company.

4) Find employees whose salary is an outlier (above 2 standard deviations) within their comapny.

5) Find employees whose salary is an outlier (above 95th percentile) within their comapny.

6) Find the company with the highest number of outlying salaries (your choice which outlier to use).

7) Find the company with most non-managing employees.

8) Find the company with highest average difference between manager salary and non-manager employee salary.

9) Assume that each non-managing employee genererates around 2x their salary in revenue. Managing employees don't directly contribute to revenue. Estimate revenue and ``profit'' for each company (assume profit = revenue - all_salaries).

10) Calculate salary skew for profitable (profit > 0) companies from question 9.

Write answers in an email; put "CISC 3140 HW2" in email subject.

HW3: This homework is to prepare the environment for HW4, so try to get it done by next week. Download and install: Apache http://httpd.apache.org/ PostgreSQL, http://www.postgresql.org/ PHP http://www.php.net/ Google for, and create, phpinfo page, ensure that php works under apache and phpinfo lists PostgreSQL libraries. Setup PostgreSQL (create database, user, and give owernship of db to that user). Keep the rest of the confirmation as default as possible. Make sure you remember administrator passwords and installation paths where everything is installed. (e.g. know where Apache is installed, where PostgreSQL keeps its database files, configuration files, etc.).

HW4: Setup a publically accessible website anywhere, with apache/php & database, and implement a database enabled guestbook (similar to the one we did in class (note that code is posted under 20121009.zip link)). Send me a link to the working website where I can sign your guestbook and download source code. Suggestions of places where you can setup your own website: any webhosting corp, there are some free ones out there; for ~\$20 you can get http://www.linode.com/

HW5: Same as HW4, except make your guestbook password protected. Create a signup page where a user can create an account to your website, then don't let anyone sign the guestbook unless they're logged in. As before, send me email with the website link, and source code.

HW6: Read Naive_Bayes_classifier (wiki), and implement a program to classify files. Create two folders, one containing files with contents from your emails, and one containing files with contents of your spam emails. Train a Bayes classifier using those two folders. Then classify any random pieces of text using that classifier. Extra point for incorporating this into your guestbook---where the guestbook displays a score of how likely that entry to be spam.