CISC 7510X Midterm Exam NAME: _______________________ //sgi;print'>midterm20211026.txt --> Pick the best answer that fits the question. Not all of the answers may be correct. If none of the answers fit, write your own answer. Answers must be emailed in plain text (no formatting, no attachments). Email answers to alex@theparticle.com Email subject must be "CISC 7510X Midterm Exam". Email must have your full name at the top. Answers to questions must be clearly marked (question number before each answer), and be in sequence (question 1 should come before question 2, etc.). Email must arrive by midnight on 2021-10-26. --------------------------------------------------------------------- For questions below, use the following schema definition. bbuser(uid, fname, lname, email, dob) post(sid, uid, tim, posttxt, pubtype) friend(tim, from_uid, to_uid) unfriend(tim, from_uid, to_uid) likeevent(tim, uid, sid) It's a schema for a BlueBook social network (the one from Ex Machina). It functions similar to other social networks: users have accounts (bbuser table), they can author posts (post table; if pubtype is F, then only friends can see the posts, etc.) users can click on other users and ask to be friends with them (the friend table). (whenever a friend button is clicked, a record is added to friend table). the other user must acknowledge the friendship by clicking the friend button too. (friendships are 2-way; friendships that are not acknowledged are pending). users can click a button to unfriend an existing friend (the unfriend table). users can click a button to like posts (the likeevent table). 1. (5 points) Find email of bbuser with id=12345 a. select * from bbuser where uid = 12345; b. select email from bbuser where uid = 12345; c. select * from user; d. select email from bbuser natural inner join post where uid=12345; e. (write your own answer) 2. (5 points) Find the user id of John Doe, dob: 1999-01-01. a. select uid from bbuser where uid=12345; b. select uid from bbuser where (fname,lname)=('John','Doe'); c. select uid from bbuser where fname='John' and lname='Doe' and dob=cast('1999-01-01' as date); d. select * from bbuser where (lname,fname,dob)=('John','Doe','1999-01-01'); e. (write your own answer) 3. (5 points) Get all posts by user id=12345, in time order. a. select * from post where uid=12345 order by tim; b. select * from bbuser a natural inner join post where uid=12345 order by tim; c. select b.* from bbuser a inner join post b on a.uid=b.uid where a.uid=12345; d. select * from bbuser a inner join post b on a.uid=b.uid where a.uid=12345 order by tim; e. (write your own answer) 4. (5 points) Get all posts (uid, tim, posttxt) by John Doe, dob: 1999-01-01, in time order. a. select uid, tim, posttxt from post where uid=12345 order by tim; b. select uid, tim, posttxt from bbuser a natural inner join post where (lname,fname,dob)=('John','Doe','1999-01-01') order by tim; c. select b.uid, b.tim, b.posttxt from bbuser a inner join post b on a.uid=b.uid where fname='John' and lname='Doe' and dob=cast('1999-01-01' as date) order by b.tim; d. select * from post where uid in (select uid from bbuser where (lname,fname,dob)=('John','Doe','1999-01-01')) order by tim; e. (write your own answer) 5. (5 points) Get all posts (uid, tim, posttxt) that john.doe@johndoe.com likes. a. select c.uid, c.tim, c.posttxt from bbuser a natural inner join likeevent b natural inner join post c where a.email='john.doe@johndoe.com' order by c.tim; b. select c.uid, c.tim, c.posttxt from bbuser a inner join likeevent b on a.uid=b.uid inner join post c on b.uid=c.uid where a.email='john.doe@johndoe.com' order by c.tim; c. select a.uid, c.tim, c.posttxt from bbuser a inner join likeevent b on a.uid=b.uid inner join post c on b.sid=c.sid where a.email='john.doe@johndoe.com' order by c.tim; d. select c.uid, c.tim, c.posttxt from bbuser a inner join likeevent b on a.uid=b.uid inner join post c on b.sid=c.sid where a.email='john.doe@johndoe.com' order by c.tim; e. (write your own answer) 6. (5 points) Did Jack Johnson ever tried to friend John Jackson? (e.g. clicked a button that created a friend record). a. select count(*)>0 from bbuser a inner join friend b on a.uid=b.from_uid inner join bbuser c on b.to_uid =c.uid where (a.fname, a.lname)=('Jack','Johnson') and (c.fname, c.lname)=('John','Jackson'); b. select count(*)>0 from bbuser a inner join friend b on a.uid=b.from_uid inner join friend c on b.to_uid=c.from_uid inner join bbuser d on c.to_uid =d.uid where (a.fname, a.lname)=('Jack','Johnson') and (d.fname, d.lname)=('John','Jackson'); c. select count(*)>0 from bbuser a natural inner join friend natural inner join bbuser b where (a.fname, a.lname)=('Jack','Johnson') and (b.fname, b.lname)=('John','Jackson'); d. select count(*)>0 from bbuser a inner join bbuser b on a.uid!=b.uid inner join friend c on a.uid=c.from_uid and b.uid=c.to_uid and (a.fname, a.lname)=('Jack','Johnson') and (b.fname, b.lname)=('John','Jackson'); e. (write your own answer) 7. (5 points) Did Jack Johnson ever like any posts by John Jackson? a. select count(*)>0 from bbuser a inner join likeevent b on a.uid=b.uid inner join bbuser c on b.uid=c.uid where (a.fname, a.lname)=('Jack','Johnson') and (c.fname, c.lname)=('John','Jackson'); b. select count(*)>0 from bbuser a inner join likeevent b on a.uid=b.uid inner join post c on b.sid=c.sid inner join bbuser d on c.uid=d.uid where (a.fname, a.lname)=('Jack','Johnson') and (d.fname, d.lname)=('John','Jackson'); c. select count(*)>0 from bbuser a inner join likeevent b on a.uid=b.uid inner join post c on b.sid=c.sid inner join bbuser d on c.uid=d.uid where (a.fname, a.lname)=('John','Jackson') and (d.fname, d.lname)=('Jack','Johnson'); d. select count(*)>0 from bbuser a natural inner join likeevent natural inner join post natural inner join bbuser d where (a.fname, a.lname)=('John','Jackson') and (d.fname, d.lname)=('Jack','Johnson'); e. (write your own answer) 8. (5 points) What's the 10 most popular email domains among BlueBook users? (e.g. for email bob@yahoo.com, username=bob, domain=yahoo.com ) a. select substr(email, position('@' in email)+1, 1000) edomain,count(*) cnt from bbuser group by 1 order by 2 desc limit 10; b. with edomain as ( select substr(email, position('@' in email)+1, 1000) edomain,count(*) cnt from bbuser group by 1 ), mx as ( select max(cnt) m from edomain ) select edomain from edomain cross join mx where cnt = m limit 10; c. with edomain as ( select substr(email, position('@' in email)+1, 1000) edomain,count(*) cnt from bbuser group by 1 ), rnk as ( select a.*, rank() over (order by edomain) r from edomain a ) select edomain from rnk where r <= 10; d. with rnk as ( select substr(email, position('@' in email)+1, 1000) edomain, dense_rank() over (order by count(*) desc) r from bbuser group by 1 ) select edomain from rnk where r <= 10; e. (write your own answer) 9. (5 points) What's the 10 most popular email domains for each age group, where age groups are: 0-18, 19-29, 30-49, 50-65, 66-and up. a. with bbuserwage as ( select a.*, extract(year from age(dob)) age, substr(email, position('@' in email)+1, 1000) edomain from bbuser a ), bbuserwagegrp as ( select a.*, case when age between 0 and 18 then '0-18' when age between 19 and 29 then '19-29' when age between 30 and 49 then '30-49' when age between 50 and 65 then '50-65' else '66-up' end agegrp from bbuserwage a ), rnk as ( select agegrp, edomain, dense_rank() over (order by count(*) desc) r from bbuserwagegrp a group by agegrp, edomain ) select a.* from rnk a where r<=10; b. with bbuserwage as ( select a.*, extract(year from age(dob)) age from bbuser a ), bbuserwagegrp as ( select substr(email, position('@' in email)+1, 1000) edomain, case when age < 18 then '0-18' when age < 29 then '19-29' when age < 49 then '30-49' when age < 65 then '50-65' else '66-up' end agegrp, dense_rank() over (order by count(*) desc) r from bbuserwage a group by 1,2 ) select a.* from bbuserwagegrp a where r<=10; c. with rnk as ( select case when extract(year from age(dob)) < 18 then '0-18' when extract(year from age(dob)) < 29 then '19-29' when extract(year from age(dob)) < 49 then '30-49' when extract(year from age(dob)) < 65 then '50-65' else '66-up' end agegrp, substr(email, position('@' in email)+1, 1000) edomain, dense_rank() over (order by count(*) desc ) r from bbuser a group by 1,2 ) select * from rnk where r<=10; d. with rnk as ( select case when extract(year from age(dob)) < 18 then '0-18' when extract(year from age(dob)) < 29 then '19-29' when extract(year from age(dob)) < 49 then '30-49' when extract(year from age(dob)) < 65 then '50-65' else '66-up' end agegrp, substr(email, position('@' in email)+1, 1000) edomain, rank() over (order by count(*)) r from bbuser a group by 1,2 ) select * from rnk where r<=10; e. (write your own answer) 10.(5 points) It has come to our attention that the like button may be clicked multiple times---each time generating a new likeevent (with a new timestamp). Create a likeevent_fixed table that has deduped likeevents: only the first likeevent should be counted---the rest should not be included in likeevent_fixed. For future questions, you may assume that likeevent is fixed. a. create table likeevent_fixed as select distinct a.* from likeevent a; b. create table likeevent_fixed as with blah as ( select a.*, row_number() over (partition by uid order by tim) rn from likeevent a ) select * from blah where rn=1; c. create table likeevent_fixed as with blah as ( select a.*, row_number() over (partition by uid,sid order by tim desc) rn from likeevent a ) select * from blah where rn=1; d. create table likeevent_fixed as select min(tim) tim, uid,sid from likeevent group by 2,3; e. (write your own answer) 11.(5 points) It has come to our attention that the friend button may be clicked multiple times---each time generating a new friend event (with a new timestamp). Create a friend_fixed table that has deduped friend events: TIP: unlike the like button, an unfriend-ed friend may be re-friended. e.g.: friend from a to b friend from a to b <-- redundunt and should be removed in friend_fixed unfriend from a to b <-- friendship ended friend from a to b <-- new friendship For future questions, you may assume that friend/unfriend events are fixed of this issue. a. create table friend_fixed as select distinct * from friend; b. create table friend_fixed as select min(tim) tim, from_uid, to_uid from friend group by 2,3; c. create table friend_fixed as with evts as ( select 'F' t, tim, from_uid, to_uid from friend union all select 'U' t, tim, from_uid, to_uid from unfriend ), evtslg as ( select a.*,lag(t) over (partition by from_uid, to_uid order by tim) lt from evts a ) select tim, from_uid, to_uid from evtslg where t='F' and coalesce(lt,'U')='U'; d. create table friend_fixed as with evtslg as ( select a.*,lag(tim) over (partition by from_uid, to_uid order by tim) lag_tim from friend a ) select a.tim, a.from_uid, a.to_uid from evtslg a left outer join unfriend b on a.from_uid=b.from_uid and a.to_uid=b.to_uid where b.tim is null or (b.tim >= a.tim and b.tim <= a.lag_tim); e. (write your own answer) 12.(5 points) BlueBook user 12345 created a post. Select user ids of all their friends who will see it. TIP: friendships are 2-way. a. select to_uid from friend where from_uid=12345; b. select to_uid from friend where from_uid=12345 except select to_uid from unfriend where from_uid=12345; c. select a.to_uid from friend a inner join friend b on a.from_uid=b.to_uid and a.to_uid=b.from_uid left outer join unfriend c on a.from_uid=c.from_uid and a.to_uid=c.to_uid where c.tim is null and a.from_uid=12345; d. with frnd as ( select a.from_uid, a.to_uid from friend a left outer join unfriend b on a.from_uid=b.from_uid and a.to_uid=b.to_uid group by 1,2 having max(a.tim) >= coalesce( max(b.tim), max(a.tim) ) ) select a.to_uid from frnd a inner join frnd b on a.from_uid=b.to_uid and a.to_uid=b.from_uid where a.from_uid=12345; e. (write your own answer) 13.(5 points) BlueBook user 12345 logs in, select 20 most recent posts (uid, tim, posttxt) from their friends they will see. TIP: friendships are 2-way. a. select uid, tim, posttxt from post where uid=12345 order by tim desc limit 20; b. with uids as ( select from_uid from friend where to_uid=12345 except select from_uid from unfriend where to_uid=12345 ), rnk as ( select a.uid, a.tim, a.posttxt, dense_rank() over (partition by a.uid order by a.tim desc) r from post a inner join uids b on a.uid=b.from_uid ) select uid, tim, posttxt from rnk where r <= 20; c. with frnd as ( select a.from_uid, a.to_uid from friend a left outer join unfriend b on a.from_uid=b.from_uid and a.to_uid=b.to_uid group by 1,2 having max(a.tim) >= coalesce( max(b.tim), max(a.tim) ) ), rnk as ( select c.uid, c.tim, c.posttxt, dense_rank() over (partition by a.to_uid order by c.tim desc) r from frnd a inner join frnd b on a.from_uid=b.to_uid and a.to_uid=b.from_uid inner join post c on a.from_uid=c.uid where a.to_uid=12345 ) select uid, tim, posttxt from rnk where r <= 20; d. select d.uid, d.tim, d.posttxt from friend a inner join friend b on a.from_uid=b.to_uid and a.to_uid=b.from_uid left outer join unfriend c on a.from_uid=c.from_uid and a.to_uid=c.to_uid inner join post d on a.from_uid=d.uid where c.tim is null and a.to_uid=12345 order by 2 desc limit 20; e. (write your own answer) 14.(5 points) We suspect click-fraud (attempt 1): identify user ids that clicked like button more than 10000 times. a. select uid from likeevent group by uid having count(*) > 10000; b. with cnts as ( select count(*) cnt from likeevent ) select * from cnts where cnt > 10000; c. with cnts as ( select uid, count(*) over (partition by uid order by tim) cnt from likeevent ) select * from cnts where cnt > 10000; d. select count(*) cnt from bbuser a inner join likeevent b on a.uid=b.uid group by a.uid having count(*) > 10000; e. (write your own answer) 15.(5 points) We suspect click-fraud (attempt 2), identify user ids that are in top 1% of like-button clickers. a. with cnts as ( select uid, count(*) cnt from likeevent group by uid ), pr as ( select a.*, percentile_cont(0.99) within group (order by cnt) pr from cnts a ) select * from pr where cnt >= pr; b. with cnts as ( select uid, count(*) cnt from likeevent group by uid ), rnk as ( select a.*, rank() over (order by cnt desc) pr from cnts a ) select * from rnk where pr < 1; c. with cnts as ( select uid, 1.0*row_number() over (order by count(*) desc) / count(*) as pr from likeevent group by uid ) select uid from cnts where pr <= 0.01; d. with cnts as ( select uid, count(*) as cnt from likeevent group by uid ), wrn as ( select a.*, row_number() over (order by cnt) rn from cnts a ) select * from wrn where rn/cnt <= 0.01; e. (write your own answer) 16.(5 points) We suspect click-fraud (attempt 3), identify user ids that are in top 1% of like button clickers within any day. (TIP: top 1% within each day). a. with cnts as ( select cast(tim as date) dt, uid, count(*) cnt from likeevent group by 1,2 ), pr as ( select a.*, percentile_cont(0.99) within group (partition by dt order by cnt) pr from cnts a ) select * from pr where cnt >= pr; b. with cnts as ( select cast(tim as date) dt, uid, count(*) cnt from likeevent group by 1,2 ), rnk as ( select a.*, rank() over (partition by dt order by cnt desc) pr from cnts a ) select * from rnk where pr < 1; c. with cnts as ( select cast(tim as date) dt, uid, 1.0*row_number() over (partition by cast(tim as date) order by count(*) desc) / count(*) as pr from likeevent group by 1,2 ) select dt, uid from cnts where pr <= 0.01; d. with cnts as ( select cast(tim as date) dt, uid, count(*) cnt from likeevent group by 1,2 ), wrn as ( select a.*, row_number() over (partition by dt order by cnt) rn from cnts a ) select * from wrn where rn/cnt <= 0.01; e. (write your own answer) 17.(5 points) How many friendships are there? (Friendships are 2-way, so if A is friend of B and B is friend of A, that only counts as 1 friendship). a. select count(*)/2 - (select count(*) from unfriend) from friend; b. with frnd as ( select a.from_uid, a.to_uid from friend a left outer join unfriend b on a.from_uid=b.from_uid and a.to_uid=b.to_uid group by 1,2 having max(a.tim) >= coalesce( max(b.tim), max(a.tim) ) ) select count(*) from frnd a inner join frnd b on a.from_uid=b.to_uid and a.to_uid=b.from_uid where a.from_uid < a.to_uid; c. select count(*) from friend a left outer join unfriend b on a.from_uid=b.from_uid and a.to_uid=b.to_uid where a.from_uid < a.to_uid having max(a.tim) >= coalesce( max(b.tim), max(a.tim) ); d. with cnts as ( select 1 t, tim, from_uid, to_uid from friend union all select -1 t, tim, from_uid, to_uid from unfriend ), stats as ( select from_uid, to_uid, sum(t) s from cnts group by 1,2 ) select sum(s) cnt from stats; e. (write your own answer) 18.(5 points) How many friendships are re-friended (were broken with an unfriend event, and then re-friended?). a. with frnd as ( select a.from_uid, a.to_uid from friend a left outer join unfriend b on a.from_uid=b.from_uid and a.to_uid=b.to_uid group by 1,2 having max(a.tim) >= coalesce( max(b.tim), max(a.tim) ) ) select count(*) from frnd a inner join frnd b on a.from_uid=b.to_uid and a.to_uid=b.from_uid inner join unfriend c on a.from_uid=c.from_uid and a.to_uid=c.to_uid where a.from_uid < a.to_uid; b. with frnd as ( select a.from_uid, a.to_uid, max(case when b.tim is not null then 1 else 0 end) had_unfriend from friend a left outer join unfriend b on a.from_uid=b.from_uid and a.to_uid=b.to_uid group by 1,2 having max(a.tim) >= coalesce( max(b.tim), max(a.tim) ) ) select count(*) from frnd a inner join frnd b on a.from_uid=b.to_uid and a.to_uid=b.from_uid where a.from_uid < a.to_uid and (a.had_unfriend>0 or b.had_unfriend>0); c. with stats as ( select 1 t, tim, from_uid, to_uid from friend union all select -1 t, tim, from_uid, to_uid from unfriend ), stats2 as ( select from_uid, to_uid, sum(t) friend_status, max(case when t < 0 then 1 else 0 end) had_unfriend from stats group by 1,2 ) select count(*) from stats2 where friend_status = 0 and had_unfriend > 0; d. with evts as ( select 'F' t, tim, least(from_uid,to_uid) uid1, greatest(from_uid,to_uid) uid2, case when from_uid lead(t) over (partition by uid1,uid2 order by tim rows between unbounded preceding and current row) lead_t, case when sum(case when d=0 and t='F' then 1 when d=0 and t='U' then -1 else 0 end) over (partition by uid1, uid2 order by tim) > 0 and sum(case when d=1 and t='F' then 1 when d=1 and t='U' then -1 else 0 end) over (partition by uid1, uid2 order by tim) > 0 then 1 else 0 end fstatus from evts a ), stats2 as ( select uid1,uid2, last_value(fstatus) over (partition by uid1,uid2 order by tim rows between unbounded preceding and unbounded following) fstatus, max(case when fstatus=1 and lead_t='U' then 1 else 0 end) over (partition by uid1,uid2 order by tim rows between unbounded preceding and unbounded following) had_unfriend from stats ) select count(*) from stats2 where fstatus=1 and had_unfriend>0; e. (write your own answer) 19.(5 points) Find users with less than 2 friends. TIP: friendships are 2-way, and some users may have no friends. a. with frnd as ( select a.from_uid, a.to_uid from friend a left outer join unfriend b on a.from_uid=b.from_uid and a.to_uid=b.to_uid group by 1,2 having max(a.tim) >= coalesce( max(b.tim), max(a.tim) ) ), frnds as ( select a.from_uid, a.to_uid from frnd a inner join frnd b on a.from_uid=b.to_uid and a.to_uid=b.from_uid ) select from_uid,count(*) from frnds group by from_uid having count(*) < 2; b. with frnd as ( select a.from_uid, a.to_uid from friend a left outer join unfriend b on a.from_uid=b.from_uid and a.to_uid=b.to_uid group by 1,2 having max(a.tim) >= coalesce( max(b.tim), max(a.tim) ) ), frnds as ( select a.from_uid, a.to_uid from frnd a inner join frnd b on a.from_uid=b.to_uid and a.to_uid=b.from_uid ) select a.uid from bbuser a left outer join frnds b on a.uid = b.from_uid group by a.uid having count(b.to_uid) < 2; c. with frnd as ( select a.from_uid, a.to_uid from friend a left outer join unfriend b on a.from_uid=b.from_uid and a.to_uid=b.to_uid group by 1,2 having max(a.tim) >= coalesce( max(b.tim), max(a.tim) ) ), frnds as ( select a.from_uid, a.to_uid from frnd a inner join frnd b on a.from_uid=b.to_uid and a.to_uid=b.from_uid ) select a.uid from bbuser a inner join frnds b on a.uid = b.from_uid group by a.uid having count(*) < 2; d. with frnd as ( select a.from_uid, a.to_uid from friend a inner join unfriend b on a.from_uid=b.from_uid and a.to_uid=b.to_uid group by 1,2 having max(a.tim) >= coalesce( max(b.tim), max(a.tim) ) ), frnds as ( select a.from_uid, a.to_uid from frnd a inner join frnd b on a.from_uid=b.to_uid and a.to_uid=b.from_uid ) select a.uid from bbuser a inner join frnds b on a.uid = b.from_uid group by a.uid having count(*) < 2; e. (write your own answer) 20. (5 points) Below query is identical to: sel1. b 2. c 3. a 4. c 5. d or e (typo) 6. a 7. b 8. d 9. a 10. d 11. c 12. d 13. c 14. a 15. c 16. c 17. b 18. d or e 19. b 20. cect a.*,b.val c from T1 a left outer join T2 b on a.key=b.key and a.val!=b.val a. select a.*,b.val c from T1 a inner join T2 b on a.key=b.key and a.val!=b.val; b. with TMP as ( select a.*,b.val c f1. b 2. c 3. a 4. c 5. d or e (typo) 6. a 7. b 8. d 9. a 10. d 11. c 12. d 13. c 14. a 15. c 16. c 17. b 18. d or e 19. b 20. crom T1 a left outer join T2 b on a.key=b.key where a.val!=b.val ) select a.* from TMP where a.val!=b.val; c. with TMP as ( select a.*,b.val c from T1 a inner join T2 b on a.key=b.key where a.val!=b.val ) select a.*,b.c from T1 a left outer join TMP b on a.key=b.key; d. All of the above queries are identical. e. None of the queries are identical to the question.