CISC 7700X Homeworks
You should EMAIL me homeworks, alex at theparticle dot com. Start email subject with "CISC 7700X HW#". Homeworks without the subject line risk being deleted and not counted.
CISC 7700X HW# 1 (due by 2nd class;): Email me your name, prefered email address, IM account (if any), major, and year.
CISC 7700X HW# 2 (due by 3rd class;):
CISC 7700X HW# 3 (due by 4th class;): We have a labeled training data set: hw3.data1.csv.gz.
Thinking of a linear model, we come up with:
y = 24*column1 + -15*column2 + -38*column3 + -7*column4 + -41*column5 + 35*column6 + 0*column7 + -2*column8 + 19*column9 + 33*column10 + -3*column11 + 7*column12 + 3*column13 + -47*column14 + 26*column15 + 10*column16 + 40*column17 + -1*column18 + 3*column19 + 0*column20 + -6
if y is > 0 then 1 othewise -1.
What is the accuracy? Calculate the confusion matrix for this model. If cost of a false negative is $1000, and cost of a false positive is $100, (and $0 for an accurate answer), what is the expected economic gain?
How can we tweak the model to increase economic gain? Come up with a model that maximizes economic gain.
Email the numbers and the steps you used to calculate things (you can do most of this homework in a spreadsheet [Excel?], but feel free to write code).
CISC 7700X HW# 4 (due by Nth class;):
Using data from: stockrow, using previous 2 years data (excluding latest quarter!), build a linear [y = a+bx ], logarithmic [y = a+b*log(x) ], exponential [ y=b*exp(a*x) ], and power curve [ y=b*x^a ] models on revenue, earnings, and dividends, for symbols IBM, MSFT, AAPL, GOOG, FB, PG, GE.
Which model works best for which metric/symbol? Show with numbers, (e.g. r-squared score, etc.). Read through: Coefficient of determination.
Using the best model for each metric, make a prediction for `next quarter' revenue, earnings, and dividends. Remember, you didn't use the last number to build your models. Compare your model's prediction to the last quarter number. What's the error?
CISC 7700X HW# 5 (due by Nth class;):
Using data from: spambase, build a Naive Bayes email classifier. Nothing too fancy, just a training module, and a classifier module. Submit code and accuracy you get on the spambase dataset.
CISC 7700X HW# 6 (due by Nth class;):
In this homework you'll build a document clustering mechanism. You can use data from hw5 (emails) or you can scrape a few news sites (e.g.: wget -r -l 2 http://www.cnn.com, etc.). Convert each document into an array of numbers (read: TF-IDF)
Submit code used to do clustering, as well as assigning a category to a new document.
CISC 7700X HW# 7:
For Iris flower data set (link), build a Decision Tree to classify iris plants (pick 70% of data for training, and use remaining 30% for testing). Submit code and accuracy you get. Feel free to use spark.ml.
CISC 7700X HW# 8 (due by Nth class;):
Build an auto-encoder for:
00000001, 00000010, 00000100, 00001000, 00010000, 00100000, 01000000, 10000000. the middle middle layer should be 3-neurons. So you'll have a neural network of 8-binary inputs, 3-inner neurons, and 8-binary outputs.