CISC 7700X Final Exam
1. c
2. b
3. c
4. d
5. b
6. a
7. d
8. b
9. c
10. a
11. c
12. 0.96
P(G) = 0.8, P(D|G)=0.6, P(D|-G)=0.1
P(-G)=0.2, P(-D|G)=0.4, P(-D|-G)=0.9
P(G|D) = P(D|G)P(G) / ( P(D|G)P(G) + P(D|-G)P(-G) )
= ( 0.6 * 0.8 ) / ( ( 0.6 * 0.8 ) + ( 0.1 * 0.2 ) ) = 0.96000
// check: P(-G|D) = P(D|-G)P(-G) / ( P(D|-G)P(-G) + P(D|G)P(G) )
// = (0.1 * 0.2) / ((0.1 * 0.2) + ( 0.6 * 0.8 ) ) = 0.040000
13. 0.5
P(C|G) = 0.8, P(C|-G)=0.2;
P(G) = 0.8, P(-G)=0.2
P(-C|G) = 0.2, P(-C|-G)=0.8
P(G|-C) = P(-C|G)P(G) / ( P(-C|G)P(G) + P(-C|-G)P(-G) )
= (0.2 * 0.8) / ( (0.2 * 0.8) + (0.8 *0.2) ) = 0.50000
// check: P(-G|-C) = P(-C|-G)P(-G) / ( P(-C|-G)P(-G) + P(-C|G)P(G) )
// = 0.8*0.2 / ( 0.8*0.2 + 0.2*0.8 ) = 0.50000
14. Not enough information.
The bayes rule would be:
P(G|D,-C) = P(D,-C|G)P(G)/ P(D,-C)
we don't know P(D,-C|G), all we have is P(D|G) and P(-C|G)
15. 0.85714
P(G) = 0.8, P(-G) = 0.2, P(D|G)=0.6, P(D|-G)=0.1, P(-C|G) = 0.2, P(-C|-G)=0.8
assume D and -C are independent, e.g.: P(D,-C|G) = P(D|G)*P(-C|G)
P(G|D,-C) = P(D,-C|G)P(G)/ P(D,-C)
P(L|S,I) = P(D|G)*P(-C|G)*P(G) / ( P(D|G)*P(-C|G)*P(G) + P(D|-G)*P(-C|-G)*P(-G) )
= 0.6*0.2*0.8 / ( 0.6*0.2*0.8 + 0.1*0.8*0.2 ) = 0.85714
simlarly, we can do it in sequence,
pretend first we learned that student is double-major, from q12, our P(G) turns to 0.96,
we then plug that into q13 (learned that no-calculus) and solve, e.g.:
P(G|-C) = P(-C|G)P(G) / ( P(-C|G)P(G) + P(-C|-G)P(-G) )
= (0.2 * 0.96) / ( (0.2 * 0.96) + (0.8 * 0.04) ) = 0.85714
OR
pretend first we learned that no-calculus, from q13, our P(G) turns to 0.5,
we then plug that into q12 (learned that double-major) and solve, e.g.:
P(G|D) = P(D|G)P(G) / ( P(D|G)P(G) + P(D|-G)P(-G) )
= ( 0.6 * 0.5 ) / ( ( 0.6 * 0.5 ) + ( 0.1 * 0.5 ) ) = 0.85714
16. 0.8
By automatically assigning a double-major and enrolling everyone in Calculus those
variables have lost any information they were giving us (all students are now double-majors with calculus).
We are left with the prior assumtions/information we had, which is P(G)=0.8.
17. b
18. Suppose n=100, then to store P(x_1,...,x_n|c) would require a table with at least 2^100 entries.
similarly, if our model has 2^100 numbers, we'd need way more than 2^100 training instances to fill in probability estimates.
also, with a table that large, we'd essentially be memorizing the input and recalling it for classification (would not generalize well)
Naive Bayes turns: P(x_1,...,x_n|c) into P(x_1|c)P(...|c)P(x_n|c). if n=100, we'd have 100 small tables.
19. c
20. d