CISC 7510X (DB1) Homeworks
You should EMAIL me homeworks, alex at theparticle dot com. Start email subject with "CISC 7510X HW#". Homeworks without the subject line risk being deleted and not counted.
CISC 7510X HW# 1 (due by 2nd class;): Email me your name, prefered email address, IM account (if any), major, and year. (oh, and install PostgreSQL).
CISC 7510X HW# 2 (due by 3rd class;): For the below `store' schema:
Also, install PostgreSQL.
CISC 7510X HW# 3 (due by 4th class;): Install PostgreSQL.
Where doorid represents the door for this event. e.g. Front door may be doorid=1, and bathroom may be doorid=2, etc. tim is timestamp, username is the user who is opening or closing the door. event is "E" for entry, and "X" for exit.
Using SQL, answer these questions (write a SQL query that answers these questions):
CISC 7510X HW# 4 (due by Nth class;): Write a command line program to "join" .csv files. Use any programming language you're comfortable with. Your program should work similarly to the unix "join" utility (google for it). Unlike the unix join, your program will not require files to be sorted on the key. Your program must also accept the "type" of join to use---merge join, inner loop join, or hash join, etc. Test your program on "large" files (e.g. make sure it doesn't blow up on one million records, etc.)
Submit source code for the program.
Also... load all files in ctsdata.20140211.tar (link on the left) into Oracle or Postgres (or whichever works for you). The format of these files is: cts(date,symbol,open,high,low,close,volume), splits(date,symbol,post,pre), dividend(date,symbol,dividend). Submit (email) whatever commands/files you used to load the data into whatever database you're using, as well as the raw space usage of the tables in your database.
CISC 7510X HW# 5 (due by Nth class): If you haven't done so already, load all files in ctsdata.20140211.tar (link on the left) into Oracle or PostgreSQL (or whichever works for you; postgresql recommended!). The format of these files is: cts(date,symbol,open,high,low,close,volume), splits(date,symbol,post,pre), dividend(date,symbol,dividend). Submit (email) whatever commands/files you used to load the data into whatever database you're using, as well as the raw space usage of the tables in your database. (this was part of previous homework).
After loading the data, create another table DAILY_PRCNT, with fields: TDATE,SYMBOL,PRCNT which will have the daily percentage gain/loss adjusted for dividends and splits.
Do NOT write procedural code (Java, C#, C/C++, etc.) for this homework (all code must be SQL, etc.).
HINT: MSFT (Microsoft) on 2004-11-12 closed at 29.97.
HINT: splits, MSFT did a 1 to 2 split on 2003-02-18. During a split, each share of a company gets turned into several shares of lower value each. The total value held by investors is not changed.
Submit query used to construct the DAILY_PRCNT table (e.g. "create table DAILY_PRCNT as select ..."). We'll do more stuff with this DAILY_PRCNT dataset in subsequent homeworks---so don't put it off and get it done on time.
CISC 7510X HW# 6 (due by Nth class): Your buddy stops over for lunch and tells you about his wonderful idea of building software for junk yards. Junk yards are places that aquire cheap old cars and sell individual parts---a $1k old junky car may have 100 parts in it that each can be sold for $20-$50, etc. A typical junk yard may have dozens to hundreds of old cars, and if you need a part, you drive by and ask... the attendant would know what car/part you're looking for and would know whether they have anything compatible in the inventory. (e.g. a "left side mirror from a white 2013 Ford Mustang" may be repainted to be compatible with a red 2014 Ford Mustang, etc.).
Now, the attendant would likely know these things (they have enormous domain knowledge). But it's still a major inventory hassle to find compatible parts... Your buddy has an idea of building such an `inventory management system' for junk yards... so anyone can start a junk yard, and junk yards can get much bigger. The idea is that the customer would drive in, type in the car/part they're looking for, and the system would tell them if there's a compatible car/part available (and where it is), or can be made compatible with minor tweaks (such as repainting, etc.). If part is not available locally, the software should be internet enabled to find the compatible parts in other junk yards running the same software. License per junk yard, $20k, with $2k/year maintenance, and your buddy thinks he can immediately sell it to at least 10 junk yards near major city centers, and perhaps a few hundred over the next few years. So now you have a case for a lucrative business... your task is to build it.
Go through the process of designing this inventory system. What are objects? What are events? Create a database schema, etc. How would the search process work? (e.g. go through the motions of: new junky car arrives, how is it inventoried? new customer arrives looking for a part, how does the system find a compatible part? where can humans be eliminated from this process?).
Submit writeup of the design (nothing too complicated, just a 1 page description---something that would convince me that you're the right contractor for this project---that you know what you're doing). Also submit database schema (DDL, create table statements), and query statements/process to find a compatible part.
CISC 7510X HW# 7 (due by Nth class;): Doing something useful with the data from HW5: Background: Pairs trading. Using the percentage returns table you built in HW5: Your task is to identify potential symbol pairs that have HIGH correlation, and are suitable for pairs trading. While everyone agrees that this strategy works, nobody agrees on the best way to identify correlation---especially when considered in relation to the rest of the market.
For this homework, feel free to use whatever you think is appropriate for correlation (if not sure, try Pearson; Take a log of the percentage gain, and apply pearson on top of that. Yes, you can do all this in SQL.).
Submit 10 "best" symbol pairs, each of which trades at least ~$10m a day, suitable for pairs trading in December 2013 (yah, I know it's an old date). Along with the pairs, submit their correlation coefficients for previous year, and the month of December 2013. (assume you were trading $1m worth, and you traded those exact 10 pairs, how much would you have gained/lost during that period?). Also submit the sql code to get those 10 symbols from the dataset.
CISC 7510X HW# 8: Given a database table such as: create table file(id int, parentid int, name varchar(1024), size int, type char(1)); write a single (recursive) database query to list FULL PATH of all files. [assume type is either "F" or "D" for file or directory]. Your query should give you similar output to unix command: "find . -type f".
Submit query in an email; put "CISC 7510X HW8" in email subject.