CISC 7700X Final Exam 1. c 2. b 3. c 4. d 5. b 6. a 7. d 8. b 9. c 10. a 11. c 12. 0.96 P(G) = 0.8, P(D|G)=0.6, P(D|-G)=0.1 P(-G)=0.2, P(-D|G)=0.4, P(-D|-G)=0.9 P(G|D) = P(D|G)P(G) / ( P(D|G)P(G) + P(D|-G)P(-G) ) = ( 0.6 * 0.8 ) / ( ( 0.6 * 0.8 ) + ( 0.1 * 0.2 ) ) = 0.96000 // check: P(-G|D) = P(D|-G)P(-G) / ( P(D|-G)P(-G) + P(D|G)P(G) ) // = (0.1 * 0.2) / ((0.1 * 0.2) + ( 0.6 * 0.8 ) ) = 0.040000 13. 0.5 P(C|G) = 0.8, P(C|-G)=0.2; P(G) = 0.8, P(-G)=0.2 P(-C|G) = 0.2, P(-C|-G)=0.8 P(G|-C) = P(-C|G)P(G) / ( P(-C|G)P(G) + P(-C|-G)P(-G) ) = (0.2 * 0.8) / ( (0.2 * 0.8) + (0.8 *0.2) ) = 0.50000 // check: P(-G|-C) = P(-C|-G)P(-G) / ( P(-C|-G)P(-G) + P(-C|G)P(G) ) // = 0.8*0.2 / ( 0.8*0.2 + 0.2*0.8 ) = 0.50000 14. Not enough information. The bayes rule would be: P(G|D,-C) = P(D,-C|G)P(G)/ P(D,-C) we don't know P(D,-C|G), all we have is P(D|G) and P(-C|G) 15. 0.85714 P(G) = 0.8, P(-G) = 0.2, P(D|G)=0.6, P(D|-G)=0.1, P(-C|G) = 0.2, P(-C|-G)=0.8 assume D and -C are independent, e.g.: P(D,-C|G) = P(D|G)*P(-C|G) P(G|D,-C) = P(D,-C|G)P(G)/ P(D,-C) P(L|S,I) = P(D|G)*P(-C|G)*P(G) / ( P(D|G)*P(-C|G)*P(G) + P(D|-G)*P(-C|-G)*P(-G) ) = 0.6*0.2*0.8 / ( 0.6*0.2*0.8 + 0.1*0.8*0.2 ) = 0.85714 simlarly, we can do it in sequence, pretend first we learned that student is double-major, from q12, our P(G) turns to 0.96, we then plug that into q13 (learned that no-calculus) and solve, e.g.: P(G|-C) = P(-C|G)P(G) / ( P(-C|G)P(G) + P(-C|-G)P(-G) ) = (0.2 * 0.96) / ( (0.2 * 0.96) + (0.8 * 0.04) ) = 0.85714 OR pretend first we learned that no-calculus, from q13, our P(G) turns to 0.5, we then plug that into q12 (learned that double-major) and solve, e.g.: P(G|D) = P(D|G)P(G) / ( P(D|G)P(G) + P(D|-G)P(-G) ) = ( 0.6 * 0.5 ) / ( ( 0.6 * 0.5 ) + ( 0.1 * 0.5 ) ) = 0.85714 16. 0.8 By automatically assigning a double-major and enrolling everyone in Calculus those variables have lost any information they were giving us (all students are now double-majors with calculus). We are left with the prior assumtions/information we had, which is P(G)=0.8. 17. b 18. Suppose n=100, then to store P(x_1,...,x_n|c) would require a table with at least 2^100 entries. similarly, if our model has 2^100 numbers, we'd need way more than 2^100 training instances to fill in probability estimates. also, with a table that large, we'd essentially be memorizing the input and recalling it for classification (would not generalize well) Naive Bayes turns: P(x_1,...,x_n|c) into P(x_1|c)P(...|c)P(x_n|c). if n=100, we'd have 100 small tables. 19. c 20. d