main
December 5th, 2018    

CISC 7700X
Main
Files
Syllabus
Links
Homeworks


Notes
Intro
Probability
Confusion Matrix
Models
Hyperplanes
Optimization
TODO: Features
Quantization Vector Quantization
Naive Bayes
Graphs
Clustering
Big Data
Hadoop/Hive
HBase Primer
Spark Primer
Decision Trees


Fall2018 Tests
Midterm Midterm anskey

SQLRunner

CISC 7700X - Introduction to Data Science

CISC 7700X : Wednesdays 6:05-8:10, room: 236 NE

Primary E-Mail: alex at theparticle dot com
GoogleTalk: alex at theparticle dot com

Books:

[recommended] Doing Data Science: Straight Talk from the Frontline
By Cathy O'Neil, Rachel Schutt, Publisher: O'Reilly Media

[recommended] Data Science from Scratch: First Principles with Python
by Joel Grus

[recommended] Pattern Recognition and Machine Learning
by Christopher Bishop, Publisher: Springer

[recommended] Big Data Analytics with Spark: A Practitioner's Guide to Using Spark for Large Scale Data Analysis
Authors: Guller, Mohammed, Publisher: Apress

[recommended] Data Smart: Using Data Science to Transform Information into Insight
Authors: John W. Foreman

[recommended] The Signal and the Noise: Why So Many Predictions Fail-But Some Don't
Authors: Nate Silver

[recommended] Thoughtful Machine Learning: A Test-Driven Approach
by Matthew Kirk

[recommended] How to Lie with Statistics
by Darrell Huff

Description:

CISC 7700X - Introduction to Data Science

Data Science is an interdisciplinary field concerned primarily with extracting information from data. It incorporates aspects of computer science, statistics, analytics, and mathematics. This introductory course focuses on providing a broad overview of key concepts, such as data management, data preparation, analysis, machine learning, performance measures, and working with large data sets.

Outline

  1. Introduction: What is Data Science?
  2. Data Analysis, and the Data Science Process.
  3. Inference, Performance Measures, Confusion Matrix
  4. Basic Algorithms & Models
  5. Data Engineering & Feature Selection
  6. Logistic Regression
  7. Naive Bayes
  8. Midterm
  9. Mining Graphs, Recommendation Engines
  10. Clustering & Dimension Reduction Techniques
  11. Working with Big Data
  12. Deep Learning
  13. Data Visualization
  14. Ethical Issues & Review

Office Hours:

I'll be around right before and right after class. Also, by appointment.

Projects:

There will be about 10 projects/homeworks.

Tests:

You will have at least a midterm and a final exam. There might also be a surprise quiz every few weeks.

In This Class:

Peer cooperation is encouraged, however, everyone must submit their own work. You will be expected to answer detailed questions about your assignments/projects. (i.e.: if you didn't write them, I'll know.)

Required:

Academic Integrity: The faculty and administration of Brooklyn College support an environment free from cheating and plagiarism. Each student is responsible for being aware of what constitutes cheating and plagiarism and for avoiding both. The complete text of the CUNY Academic Integrity Policy and the Brooklyn College procedure for implementing that policy can be found at this site: http://www.brooklyn.cuny.edu/bc/policies. If a faculty member suspects a violation of academic integrity and, upon investigation, confirms that violation, or if the student admits the violation, the faculty member MUST report the violation.

CLASSROOM BEHAVIOR: Disruptive classroom behavior negatively affects the classroom environment as well as the educational experience for students enrolled in the course. Any serious or continued disruption of class will result in a report to the Office of Judicial Affairs. Public Safety will be summoned immediately if a serious disruption prevents the continued teaching of the class and you may be subject to disciplinary action. For disruptive behavior that does not prevent the continued teaching of the class, you will receive a warning after one such disruption. If the disruptive behavior is repeated in the same or subsequent classes, you may be asked to leave the classroom for the remainder of class and you may be subject to disciplinary action.

This means that if you cheat on a test or an assignment, I must file a report which will initiate academic penalties.

Attendance is not mandatory (I don't need a doctors note!), but highly recommended. [you must attend at least a few times in the first six weeks, or you will be dropped from the class with a WU grade]. Also, it would be VERY difficult to pass the class without regular attendence; you are responsible for catching up if you miss class (for any reason). That being said, if you hardly ever show up (miss >= 4 classes) don't expect to get anything but a WU grade.

All projects, assignments, homeworks, etc., will be submitted via email (subject line: "CISC 7700X HW#"). Do not print out the assignments - they will promptly be trashed.

Grading:
Tentative grade breakup: ~25% for Midterm, ~35% for Projects, ~40% Final - These may change slightly depending on how well the class does in any of the above.





































© 2006, Particle