main
March 24th, 2023    

CISC 7700X
Main
Files
Syllabus
Links
Homeworks


Notes
Intro
Models
Distance
Confusion Matrix
Hyperplanes
Features
Quantization Probability


Past Tests
F2022 Final (key)
F2022 Midterm (key)
F2021 Midterm (key)
F2021 Final (key)
F2020 Midterm(key)
F2020 Final(key)
F2018 Midterm(key)
F2018 Final(key)
F2019 Midterm(key)



Readings
Bernoulli_1738
kelly_1956
Entropy

RndForests

Global K-means
K-means++

DL generalization
LotteryTicket

Challenger


MNIST train image
MNIST train labels
MNIST test image
MNIST test labels


SQLRunner

CISC 7700X Homeworks

You should EMAIL me homeworks, alex at theparticle dot com. Start email subject with "CISC 7700X HW#". Homeworks without the subject line risk being deleted and not counted.

CISC 7700X HW# 1 (due by 2nd class;): Email me your name, prefered email address, IM account (if any), major, and year.


CISC 7700X HW# 2 (due by 2nd class;): Using the Iris dataset, build a kNN model to identify the species of a flower given sepal_length, sepal_width, petal_length,petal_width. Feel free to use whatever language/tool you are comfortable with. I encourage you to write C/C++/Java/C#/SQL/Python code. You may also use Excel, or Weka or Colab or whatever other library/tool you find. Submit (via email), the model code.


CISC 7700X HW# 3 (due by 4th class;): We have a labeled training data set: hw3.data1.csv.gz.

Thinking of a linear model, we come up with:

y = 24*column1 + -15*column2 + -38*column3 + -7*column4 + -41*column5 + 35*column6 + 0*column7 + -2*column8 + 19*column9 + 33*column10 + -3*column11 + 7*column12 + 3*column13 + -47*column14 + 26*column15 + 10*column16 + 40*column17 + -1*column18 + 3*column19 + 0*column20 + -6

if y is > 0 then 1 othewise -1.

What is the accuracy? Calculate the confusion matrix for this model. If cost of a false negative is $1000, and cost of a false positive is $100, (and $0 for an accurate answer), what is the expected economic gain?

How can we tweak the model to increase economic gain? Come up with a model that maximizes economic gain (approximations are OK; try guestimating a few possibilities in a spreadsheet, etc.).

Email the numbers and the steps you used to calculate things (you can do most of this homework in a spreadsheet [Excel?], but I highly encourage you to write code---learn Python if not sure where to start).


CISC 7700X HW# 4 (due by Nth class;):

Using data from: stockrow, using previous 2 years data (excluding latest quarter!), build a linear [y = a+bx ], logarithmic [y = a+b*log(x) ], exponential [ y=b*exp(a*x) ], and power curve [ y=b*x^a ] models on revenue, earnings, and dividends, for symbols IBM, MSFT, AAPL, GOOG, FB, PG, GE.

Which model works best for which metric/symbol? Show with numbers, (e.g. r-squared score, etc.). Read through: Coefficient of determination.

Using the best model for each metric, make a prediction for `next quarter' revenue, earnings, and dividends. Remember, you didn't use the last number to build your models. Compare your model's prediction to the last quarter number. What's the error? [hint]


CISC 7700X HW# 5 (due by Nth class;):

Using data from: spambase, build a Naive Bayes email classifier. Nothing too fancy, just a training module, and a classifier module. Submit code and accuracy you get on the spambase dataset.






































© 2006, Particle