May 25th, 2024    

CISC 7700X

Confusion Matrix
Quantization Vector Quantization
Decision Trees
n-Tuple (src)
Guesstimating (src)
Big Data
HBase Primer
Spark Primer
Neural Nets
NN Links
End Notes

Past Tests
S2024 Final (key)
S2024 Midterm (key)
F2023 Final (key)
F2023 Midterm
S2023 Final (key)
S2023 Midterm (key)
F2022 Final (key)
F2022 Midterm (key)
F2021 Midterm (key)
F2021 Final (key)
F2020 Midterm(key)
F2020 Final(key)
F2018 Midterm(key)
F2018 Final(key)
F2019 Midterm(key)



Global K-means

Attntn All U Nd
Attntn All U Nd ntbk

DL generalization


MNIST train image
MNIST train labels
MNIST test image
MNIST test labels


CISC 7700X - Introduction to Data Science

CISC 7700X : TH 08:15-10:20PM TBA

Spring 2024 semester, class will be in-person.

WhatsApp: group link

Office Hours:
I'm reachable online via email, before or after class, or during scheduled office hours. Please text me or send me an email to arrange.


[recommended] Doing Data Science: Straight Talk from the Frontline
By Cathy O'Neil, Rachel Schutt, Publisher: O'Reilly Media

[recommended] Data Science from Scratch: First Principles with Python
by Joel Grus

[recommended] Pattern Recognition and Machine Learning
by Christopher Bishop, Publisher: Springer

[recommended] Big Data Analytics with Spark: A Practitioner's Guide to Using Spark for Large Scale Data Analysis
Authors: Guller, Mohammed, Publisher: Apress

[recommended] Data Smart: Using Data Science to Transform Information into Insight
Authors: John W. Foreman

[recommended] The Signal and the Noise: Why So Many Predictions Fail-But Some Don't
Authors: Nate Silver

[recommended] Thoughtful Machine Learning: A Test-Driven Approach
by Matthew Kirk

[recommended] How to Lie with Statistics
by Darrell Huff


CISC 7700X - Introduction to Data Science

Data Science is an interdisciplinary field concerned primarily with extracting information from data. It incorporates aspects of computer science, statistics, analytics, and mathematics. This introductory course focuses on providing a broad overview of key concepts, such as data management, data preparation, analysis, machine learning, performance measures, and working with large data sets.


  1. Introduction: What is Data Science?
  2. Data Analysis, and the Data Science Process.
  3. Inference, Performance Measures, Confusion Matrix
  4. Basic Algorithms & Models
  5. Data Engineering & Feature Selection
  6. Logistic Regression
  7. Naive Bayes
  8. Midterm
  9. Mining Graphs, Recommendation Engines
  10. Clustering & Dimension Reduction Techniques
  11. Working with Big Data
  12. Deep Learning
  13. Data Visualization
  14. Ethical Issues & Review


There will be about 10 projects/homeworks.


You will have at least a midterm and a final exam. There might also be a surprise quiz every few weeks.

In This Class:

Peer cooperation is encouraged, however, everyone must submit their own work. You will be expected to answer detailed questions about your assignments/projects. (i.e.: if you didn't write them, I'll know.)

ChatGPT (and others):
You are allowed to use whatever tools in this class. If using a tool to generate an answer, you must credit it, and provide the prompt you used to generate the answer.


Academic Integrity: The faculty and administration of Brooklyn College support an environment free from cheating and plagiarism. Each student is responsible for being aware of what constitutes cheating and plagiarism and for avoiding both. The complete text of the CUNY Academic Integrity Policy and the Brooklyn College procedure for implementing that policy can be found at this site: If a faculty member suspects a violation of academic integrity and, upon investigation, confirms that violation, or if the student admits the violation, the faculty member MUST report the violation.

CLASSROOM BEHAVIOR: Disruptive classroom behavior negatively affects the classroom environment as well as the educational experience for students enrolled in the course. Any serious or continued disruption of class will result in a report to the Office of Judicial Affairs. Public Safety will be summoned immediately if a serious disruption prevents the continued teaching of the class and you may be subject to disciplinary action. For disruptive behavior that does not prevent the continued teaching of the class, you will receive a warning after one such disruption. If the disruptive behavior is repeated in the same or subsequent classes, you may be asked to leave the classroom for the remainder of class and you may be subject to disciplinary action.

This means that if you cheat on a test or an assignment, I must file a report which will initiate academic penalties.

Attendance is not mandatory (I don't need a doctors note!), but highly recommended. [you must attend at least a few times in the first six weeks, or you will be dropped from the class with a WU grade]. Also, it would be VERY difficult to pass the class without regular attendence; you are responsible for catching up if you miss class (for any reason). That being said, if you hardly ever show up (miss >= 4 classes) don't expect to get anything but a WU grade.

All projects, assignments, homeworks, etc., will be submitted via email (subject line: "CISC 7700X HW#"). Do not print out the assignments - they will promptly be trashed.

Tentative grade breakup: ~25% for Midterm, ~35% for Projects, ~40% Final - These may change slightly depending on how well the class does in any of the above.

© 2006, Particle