CISC 7700X Final Exam

1. c
2. b
3. c
4. d
5. b
6. a
7. d
8. b
9. c
10. a
11. c
12. 0.96
  P(G) = 0.8, P(D|G)=0.6, P(D|-G)=0.1
  P(-G)=0.2, P(-D|G)=0.4, P(-D|-G)=0.9
  P(G|D) = P(D|G)P(G) / ( P(D|G)P(G) + P(D|-G)P(-G) )
       = ( 0.6 * 0.8 ) / ( ( 0.6 * 0.8 ) + ( 0.1 * 0.2 ) ) = 0.96000
   // check: P(-G|D) = P(D|-G)P(-G) / ( P(D|-G)P(-G) + P(D|G)P(G) )
   //    = (0.1 * 0.2) / ((0.1 * 0.2) + ( 0.6 * 0.8 ) ) = 0.040000
13. 0.5
   P(C|G) = 0.8, P(C|-G)=0.2; 
   P(G) = 0.8, P(-G)=0.2
   P(-C|G) = 0.2, P(-C|-G)=0.8
   P(G|-C) = P(-C|G)P(G) / ( P(-C|G)P(G) + P(-C|-G)P(-G) )
       = (0.2 * 0.8) / ( (0.2 * 0.8) + (0.8 *0.2) ) = 0.50000
   // check:  P(-G|-C) = P(-C|-G)P(-G) / ( P(-C|-G)P(-G) + P(-C|G)P(G) )
   //     =  0.8*0.2 / ( 0.8*0.2 + 0.2*0.8 ) = 0.50000

14. Not enough information. 
  The bayes rule would be: 
   P(G|D,-C) = P(D,-C|G)P(G)/ P(D,-C)
   we don't know P(D,-C|G), all we have is P(D|G) and P(-C|G)

15. 0.85714
  P(G) = 0.8, P(-G) = 0.2, P(D|G)=0.6,  P(D|-G)=0.1, P(-C|G) = 0.2, P(-C|-G)=0.8

  assume D and -C are independent, e.g.: P(D,-C|G) = P(D|G)*P(-C|G) 
  
  P(G|D,-C) = P(D,-C|G)P(G)/ P(D,-C)

  P(L|S,I) = P(D|G)*P(-C|G)*P(G) / ( P(D|G)*P(-C|G)*P(G) + P(D|-G)*P(-C|-G)*P(-G) )
     = 0.6*0.2*0.8 / ( 0.6*0.2*0.8 + 0.1*0.8*0.2 ) = 0.85714
   
  simlarly, we can do it in sequence,

     pretend first we learned that student is double-major, from q12, our P(G) turns to 0.96,
     we then plug that into q13 (learned that no-calculus) and solve, e.g.: 
       P(G|-C) = P(-C|G)P(G) / ( P(-C|G)P(G) + P(-C|-G)P(-G) )
           = (0.2 * 0.96) / ( (0.2 * 0.96) + (0.8 * 0.04) ) =  0.85714
         
    OR
    
     pretend first we learned that no-calculus, from q13, our P(G) turns to 0.5,
     we then plug that into q12 (learned that double-major) and solve, e.g.: 
        P(G|D) = P(D|G)P(G) / ( P(D|G)P(G) + P(D|-G)P(-G) )
            = ( 0.6 * 0.5 ) / ( ( 0.6 * 0.5 ) + ( 0.1 * 0.5 ) ) = 0.85714

16. 0.8
  By automatically assigning a double-major and enrolling everyone in Calculus those
  variables have lost any information they were giving us (all students are now double-majors with calculus).
  We are left with the prior assumtions/information we had, which is P(G)=0.8.

17. b
18. Suppose n=100, then to store P(x_1,...,x_n|c) would require a table with at least 2^100 entries.
   similarly, if our model has 2^100 numbers, we'd need way more than 2^100 training instances to fill in probability estimates.
   also, with a table that large, we'd essentially be memorizing the input and recalling it for classification (would not generalize well)
   Naive Bayes turns: P(x_1,...,x_n|c) into P(x_1|c)P(...|c)P(x_n|c). if n=100, we'd have 100 small tables.

19. c
20. d