CISC 7700X Homeworks
You should EMAIL me homeworks, alex at theparticle dot com. Start email subject with "CISC 7700X HW#". Homeworks without the subject line risk being deleted and not counted.
CISC 7700X HW# 1 (due by 2nd class;): Email me your name, prefered email address, IM account (if any), major, and year.
CISC 7700X HW# 2 (due by 2nd class;): Using the Iris dataset, build a kNN model to identify the species of a flower given sepal_length, sepal_width, petal_length,petal_width. Feel free to use whatever language/tool you are comfortable with. I encourage you to write C/C++/Java/C#/SQL/Python code. You may also use Excel, or Weka or Colab or whatever other library/tool you find. Submit (via email), the model code.
CISC 7700X HW# 3 (due by 4th class;): We have a labeled training data set: hw3.data1.csv.gz.
Thinking of a linear model, we come up with:
y = 24*column1 + -15*column2 + -38*column3 + -7*column4 + -41*column5 + 35*column6 + 0*column7 + -2*column8 + 19*column9 + 33*column10 + -3*column11 + 7*column12 + 3*column13 + -47*column14 + 26*column15 + 10*column16 + 40*column17 + -1*column18 + 3*column19 + 0*column20 + -6
if y is > 0 then 1 othewise -1.
What is the accuracy? Calculate the confusion matrix for this model. If cost of a false negative is $1000, and cost of a false positive is $100, (and $0 for an accurate answer), what is the expected economic gain?
How can we tweak the model to increase economic gain? Come up with a model that maximizes economic gain (approximations are OK; try guestimating a few possibilities in a spreadsheet, etc.).
Email the numbers and the steps you used to calculate things (you can do most of this homework in a spreadsheet [Excel?], but I highly encourage you to write code---learn Python if not sure where to start).
CISC 7700X HW# 4 (due by Nth class;):
Using data from: stockrow, using previous 2 years data (excluding latest quarter!), build a linear [y = a+bx ], logarithmic [y = a+b*log(x) ], exponential [ y=b*exp(a*x) ], and power curve [ y=b*x^a ] models on revenue, earnings, and dividends, for symbols IBM, MSFT, AAPL, GOOG, FB, PG, GE.
Which model works best for which metric/symbol? Show with numbers, (e.g. r-squared score, etc.). Read through: Coefficient of determination.
Using the best model for each metric, make a prediction for `next quarter' revenue, earnings, and dividends. Remember, you didn't use the last number to build your models. Compare your model's prediction to the last quarter number. What's the error? [hint]
CISC 7700X HW# 5 (due by Nth class;):
Using data from: spambase, build a Naive Bayes email classifier. Nothing too fancy, just a training module, and a classifier module. Submit code and accuracy you get on the spambase dataset.