CISC 7512X Final Exam
NAME: _______________________
Pick the best answer that fits the question. Not all of the answers may be correct.
If none of the answers fit, write your own answer.
Answers must be emailed in plain text (no formatting, no attachments).
Email must have your full name and "CISC 7512X" at the top.
Answers to questions must be clearly marked (question number before each answer),
and be in sequence (question 1 should come before question 2, etc.).
Email must arrive by 8:00AM on 2021-05-25.
For questions below, use the following schema definition.
customer(cid, fname, lname, street1, stree2, city, state, zip, dob, ssn)
policy(pid, cid, tim, typeid, premium, amnt, termid)
incident(iid, pid, tim, cost, covered)
It's a schema for an insurance company: Customers pay (premium) for policies that cover a
particular type (typeid) of event during a particular term (termid) for upto a certain dollar value (amnt).
Incidents (during the policy) have a cost, and may be covered or not.
e.g.
Bob pays $1k for a $1m policy on lightning strikes for 2021.
customer: has Bob's information
policy: type is lightning, term is 2021, premium=$1000, amnt is $1000000.
Bob gets an electric shock from an outlet (insurance company does not pay; not lightning):
incident: type = electric shock, cost $300, covered=no (false).
Bob gets a bit hurt by lightning (insurance company covers costs):
incident: type = lightning, cost $170000, covered=yes.
Bob gets a very hurt by lightning (insurance company pays out max amnt):
incident: type = lightning, cost $1000000, covered=yes.
1. (5 points) Find time when policy 235235 was created.
a. select tim
from policy
where cid=235235;
b. select tim
from policy
where pid=235235;
c. select *
from customer
where cid=236235;
d. select *
from policy
left outer join customer
using (cid)
where pid=235235;
e. (write your own answer)
2. (5 points) How many customers bought policies in January 2021?
a. select count(*) as cnt
from customer
natural inner join policy
natural left outer join incident
where tim>=cast('2021-01-01' as date) and
tim=cast('2021-01-01' as date) and
tim=cast('2021-01-01' as date) and
tim=cast('2021-01-01' as date) and
tim
e. (write your own answer)
3. (5 points) What percentage of customers had policies during termid=235235
a. select 100.0*sum(case when termid=235235 then 1.0 else 0.0 end)/sum(1.0) prcnt
from policy a;
b. with dcust as (
select distinct cid
from policy
where termid=235235
)
select 100.0*sum(cid)/sum(1.0) prcnt
from dcust a;
c. with dcust as (
select a.cid, min(pid) m
from customer a
left outer join policy b
on a.cid=b.cid and b.termid=235235
group by a.cid
)
select 100.0*sum(case when m is not null then 1.0 else 0.0 end)/sum(1.0) prcnt
from dcust;
d. select 100.0*sum(case when b.termid is not null then 1.0 else 0.0 end)/sum(1.0) prcnt
from customer a
left outer join policy b
on a.cid=b.cid and b.termid=235235;
e. (write your own answer)
4. (5 points) Find policies that have not had any incidents.
a. select a.*
from policy a
inner join incident b
using (pid)
where b.iid is null;
b. select a.*
from policy a
natural left outer join incident b
where b.iid is null;
c. select a.*
from policy a
left outer join incident b using (pid)
where b.pid is null;
d. select a.*
from policy a
cross join incident b
where a.pid=b.pid and b.iid is null;
e. (write your own answer)
5. (5 points) Find average time (in seconds) between policy creation and first incident.
a. select avg( extract( epoch from b.tim - a.tim) )
from policy a
inner join incident b
on a.pid = b.pid;
b. with diff as (
select extract(epoch from min(b.tim) - a.tim) d
from policy a
natural inner join incident b
group by a.pid,a.tim
)
select avg(d)
from diff;
c. with diff as (
select extract(epoch from min(b.tim) - a.tim) d
from policy a
inner join incident b using (pid)
group by a.pid,a.tim
)
select avg(d)
from diff;
d. with evts as (
select pid, tim from policy
union all
select pid, tim from incident
),
diff as (
select extract(epoch from lag(tim) over (partition by pid order by tim) - tim) d
from evts
)
select avg(d)
from diff;
e. (write your own answer)
6. (5 points) Identify the most popular policy type (most distinct customers).
a. with stats as (
select b.pid,count(a.cid) cnt, max( count(a.cid) ) over () maxcnt
from customer a
natural left outer join policy b
natural left outer join incident c
group by b.pid
)
select *
from stats
where cnt = maxcnt;
b. with stats as (
select typeid, count(distinct cid) cnt, max( count(distinct cid) ) over () maxcnt
from policy
group by typeid
)
select *
from stats
where cnt = maxcnt;
c. with stats as (
select typeid, sum(1) cnt
from policy
group by typeid
),
stats2 as (
select typeid, rank() over (order by cnt) rnk
from stats
)
select *
from stats2
where rnk=1;
d. with stats as (
select typeid, sum(1) cnt
from policy
group by typeid
),
stats2 as (
select typeid, dense_rank() over (order by cnt) rnk
from stats
)
select *
from stats2
where rnk=1;
e. (write your own answer)
7. (5 points) Identify customers with most policies for term=124.
a. with stats as (
select cid, rank() over ( order by count(distinct pid) desc) r
from policy
where termid=124
group by cid
)
select *
from stats
where r=1;
b. with stats as (
select a.cid, count(distinct b.pid) cnt
from customer a
natural inner join policy b
where termid=124
group by a.cid
),
mx as (
select max(cnt) m
from stats
)
select *
from stats cross join mx
where m = cnt;
c. with stats as (
select cid, count(*) cnt, min( count(*) ) over () mx
from policy
where termid=124
group by cid
)
select *
from stats
where cnt = mx;
d. select cid
from policy
where termid=124
group by cid
having count(cid) >= all(select count(cid) from customer);
e. (write your own answer)
8. (5 points) Customers may have more than one incident. What's the average
number of covered incidents do customers with policies have?
a. with stats as (
select count(*) cnt
from policy a
natural inner join incident b
where b.covered = true
group by a.cid
)
select avg(cnt) from stats;
b. with stats as (
select a.cid, sum(case when b.iid is not null then 1 else 0 end) cnt
from policy a
left outer join incident b
on a.pid=b.pid and b.covered = true
group by a.cid
)
select avg(cnt) from stats;
c. with stats as (
select a.cid, sum(case when b.iid is not null then 1 else 0 end) cnt
from policy a
left outer join incident b
on a.pid=b.pid
where b.covered = true
group by a.cid
)
select avg(cnt) from stats;
d. with stats as (
select distinct cid, iid
from policy
natural left outer join incident
where covered = true
),
cnts as (
select count(*) cnt
from stats
group by cid
having count(*) is not null
)
select avg(cnt) from cnts;
e. (write your own answer)
9. (5 points) What percentage of incidents are covered?
a. select 100.0 * sum(case when covered is true then 1 else 0 end) /
sum(case when covered is false then 1 else 0 end) prcnt
from policy a
inner join incident b
on a.pid=b.pid;
b. select 100.0 * (select count(*) from incident where covered=true)/
(select count(*) from incident where covered=false) prcnt;
c. select 100.0*sum(case when a.covered=b.covered then 1.0 else 0.0 end)/sum(1.0) prcnt
from incident a cross join incident b
where a.pid!=b.pid;
d. select 100.0*sum(case when covered=true then 1.0 else 0.0 end)/sum(1.0) prcnt
from incident a;
e. (write your own answer)
10. (5 points) Find unusually expensive terms (termid)
(terms that are 2 standard deviations more expensive than the average cost).
a. with stats as (
select termid, sum(b.cost) as cost
from policy a
natural inner join incident b
where b.covered = true
group by termid
),
stats2 as (
select a.*,
avg( cost ) over () avgcost,
stddev( cost ) over () sdcost
from stats a
),
stats3 as (
select *
from stats2
where cost >= avgcost+2*sdcost
)
select *
from stats3;
b. with stats as (
select termid, sum(b.cost) as cost,
avg( sum(b.cost) ) over () avgcost,
stddev( sum(b.cost) ) over () sdcost
from policy a
inner join incident b
on a.pid=b.pid and b.covered = true
group by termid
)
select *
from stats
where cost >= avgcost+2*sdcost;
c. with tcost as (
select termid, sum(b.cost) as cost
from policy a
inner join incident b
on a.pid=b.pid and b.covered = true
group by termid
)
select a.*
from tcost a
group by termid, cost
having cost >= avg( cost ) + 2*stddev( cost );
d. with tcost as (
select termid, sum(case when b.covered = true then b.cost else 0 end) as cost
from policy a
inner join incident b
on a.pid=b.pid
group by termid
having sum(case when b.covered = true then b.cost else 0 end) > 0
),
stats as (
select avg(cost) avgcost, stddev(cost) sdcost
from tcost
)
select termid
from tcost a natural inner join stats
group by termid
having max(cost) >= max(avgcost)+2*max(sdcost)
e. (write your own answer)
11. (5 points) Find unusually profitable terms
(terms that are 2 standard deviations more profitable than average:
profit is policy premium - payed out costs).
a. with profits as (
select termid, sum(a.premium) - sum(b.cost) profit,
avg( sum(a.premium) - sum(b.cost) ) over () avgprofit,
stddev( sum(a.premium) - sum(b.cost) ) over () sdprofit
from policy a
inner join incident b -- should be left outer join.
on a.pid=b.pid and b.covered=true
group by termid
)
select *
from profits
where profit >= avgprofit+2*sdprofit;
b. with stats as (
select termid, sum(a.premium) - sum(b.cost) profit,
from policy a
natural inner join incident b
where b.covered = true
group by termid
),
stats2 as (
select a.*,
avg( profit ) over () avgprofit,
stddev( profit ) over () sdprofit
from stats a
),
stats3 as (
select *
from stats2
where profit >= avgprofit+2*sdprofit
)
select *
from stats3;
c. with tstat as (
select termid, sum(a.premium) -
sum(case when b.covered = true then b.cost else 0 end) as profit
from policy a
inner join incident b
on a.pid=b.pid
group by termid
having sum(a.premium) - sum(case when b.covered = true then b.cost else 0 end) > 0
),
stats as (
select avg(profit) avgprofit, stddev(profit) sdprofit
from tstat
)
select termid
from tstat a natural inner join stats
group by termid, profit, avgprofit, sdprofit
having profit >= avgprofit + 2* sdprofit;
d. with revenue as (
select termid, sum(premium) rev
from policy
group by termid
),
costs as (
select termid, sum(cost) as cost
from policy natural inner join incident
group by termid
),
profit as (
select a.termid, a.rev - coalesce(b.cost,0) profit
from revenue a
left outer join costs b
on a.termid=b.termid
),
stats as (
select avg(profit) avgprofit, stddev(profit) sdprofit
from profit
)
select *
from profit a cross join stats
where profit >= avgprofit + 2* sdprofit;
e. (write your own answer)
12. (5 points) What is the geometric mean cost of typeid=2 policy?
(tip: https://en.wikipedia.org/wiki/Geometric_mean )
a. select avg(cost)
from policy a
inner join incident b
on a.pid=b.pid and b.covered=true
where a.typeid=2;
b. select sum(cost)/sum(1.0)
from policy a
inner join incident b
on a.pid=b.pid and b.covered=true
where a.typeid=2;
c. select exp ( avg ( ln( cost ) ) )
from policy a
inner join incident b
on a.pid=b.pid and b.covered=true
where a.typeid=2;
d. select exp ( sum ( ln( cost ) ) ) / sum(1.0)
from policy a
inner join incident b
on a.pid=b.pid and b.covered=true
where a.typeid=2;
e. (write your own answer)
13. (5 points) Which single-event would be worst for the insurance company for termid=123
(which typeid would cost the most).
a. with stats as (
select typeid, sum(amnt) amnt, max( sum(amnt) ) over () mx
from policy
where termid=123
group by typeid
)
select *
from stats
where amnt = mx;
b. with stats as (
select typeid, rank() over (order by sum(amnt) desc) r
from policy
where termid=123
group by typeid
)
select *
from stats
where r=1;
c. select typeid
from policy
where termid=123
group by typeid
having sum(amnt) >= all( select sum(amnt) from policy group by typeid );
d. select typeid, sum(b.cost) tot
from policy a
inner join incident b
on a.pid = b.pid and b.covered = true
group by typeid
order by 2 desc
limit 1;
e. (write your own answer)
14. (5 points) Identify customers ages 30-45 who don't have typeid=234 insurance (term-life-insurance).
a. select a.*
from customer a
natural left outer join policy b
where b.typeid=234 and
extract(year from age(a.dob)) between 30 and 45 and
b.pid is null;
b. select a.*
from customer a
left outer join policy b
on a.cid=b.cid and b.typeid=234
where extract(year from age(a.dob)) between 30 and 45 and
b.pid is null;
c. select a.*
from customer a
inner join policy b
on a.cid=b.cid and b.typeid=234
where extract(year from age(a.dob)) between 30 and 45 and
b.pid is null;
d. select a.*
from customer a
full outer join policy b
on a.cid=b.cid and b.typeid=234
where extract(year from age(a.dob)) between 30 and 45 and
b.pid is null;
e. (write your own answer)
15. (5 points) Identify most expensive age group for the insurance company (across all policies and terms). Age groups: 10-25, 26-45, 46-65, >65.
a. with stats as (
select
case when age between 10 and 25 then '10-25'
when age between 26 and 45 then '26-45'
when age between 46 and 65 then '46-65'
when age > 65 then '>65'
end as agegrp, sum(c.cost) costs, max( sum(c.cost) ) over () mx
from (select a.*, extract(year from age(dob)) as age from customer a) a
inner join policy b on a.cid=b.cid
inner join incident c on b.pid=c.pid and c.covered=true
group by agegrp
)
select *
from stats
where costs = mx;
b. with stats as (
select
case when age between 10 and 25 then '10-25'
when age between 26 and 45 then '26-45'
when age between 46 and 65 then '46-65'
when age > 65 then '>65'
end as agegrp, sum(c.cost) costs, max( sum(c.cost) ) over () mx
from (select a.*, extract(year from age(dob)) as age from customer a) a
natural inner join policy b
natural inner join incident c
where c.covered=true
group by case when age between 10 and 25 then '10-25'
when age between 26 and 45 then '26-45'
when age between 46 and 65 then '46-65'
when age > 65 then '>65'
end
)
select *
from stats
where costs = mx;
c. with custage as (
select a.*, extract(year from age(dob)) as age
from customer a
),
custgrp as (
select a.*,
case when age between 10 and 25 then '10-25'
when age between 26 and 45 then '26-45'
when age between 46 and 65 then '46-65'
when age > 65 then '>65'
end as agegrp
from custage a
)
select agegrp,sum(c.cost) costs
from custgrp a
inner join policy b on a.cid=b.cid
inner join incident c on b.pid=c.pid and c.covered=true
group by agegrp
having sum(c.cost) >= all( select sum(c.cost) from incident);
d. with custage as (
select a.*, extract(year from age(dob)) as age
from customer a
),
custgrp as (
select a.*,
case when age between 10 and 25 then '10-25'
when age between 26 and 45 then '26-45'
when age between 46 and 65 then '46-65'
when age > 65 then '>65'
end as agegrp
from custage a
),
stats as (
select agegrp,sum(c.cost) costs, rank() over (order by sum(cost) desc) rnk
from custgrp a
inner join policy b on a.cid=b.cid
inner join incident c on b.pid=c.pid and c.covered=true
group by agegrp
)
select *
from stats
where rnk=1;
e. (write your own answer)
16. (5 points) Identify suspicious and potentially fraudulent customers; customers who have unusually high chance of incidents. For each customer, calculate number (not dollar amount, but count) of covered incidents, and pick out customers that are 4 standard deviations higher than the mean.
a. with stats as (
select a.cid, count(b.*) cnt,
avg(count(b.*)) over () avgcnt,
stddev(count(b.*)) over () sdcnt
from policy a
natural left outer join incident b
where b.covered = true
group by a.cid
)
select *
from stats
where cnt >= avgcnt+4*sdcnt;
b. with stats as (
select a.cid, count(b.*) cnt,
avg(count(b.*)) over () avgcnt,
stddev(count(b.*)) over () sdcnt
from policy a
left outer join incident b
on a.pid=b.pid and b.covered = true
group by a.cid
)
select *
from stats
where cnt >= avgcnt+4*sdcnt;
c. with stats as (
select a.cid, sum(case when b.pid is not null then 1 else 0 end) cnt
from policy a
left outer join incident b
on a.pid=b.pid and b.covered = true
group by a.cid
),
stats2 as (
select a.*,
avg(cnt) over () avgcnt,
stddev(cnt) over () sdcnt
from stats a
)
select *
from stats2
where cnt >= avgcnt+2*sdcnt;
d. with stats as (
select a.cid, sum(case when b.pid is not null then 1 else 0 end) cnt
from policy a
left outer join incident b
on a.pid=b.pid and b.covered = true
group by a.cid
),
stats2 as (
select
avg(cnt) avgcnt,
stddev(cnt) sdcnt
from stats a
)
select *
from stats cross join stats2
where cnt >= avgcnt+2*sdcnt;
e. (write your own answer)
17. (5 points) Identify suspicious and potentially fraudulent customers; more than 20 different customers (potentially with variation in names and dob), that are registered under the same address and who have a policy of the same type and term. (e.g. "Bob Johnson", "Bob Johns", "Bob Jonson", etc., taking out fire insurance on their house with same address).
a. select street1, stree2, city, state, zip, typeid, termid
from customer
natural inner join policy
group by street1, stree2, city, state, zip, typeid, termid
having count(*) > 20;
b. with stats as (
select a.*,
count(*) over (partition by street1, stree2, city, state, zip, typeid, termid
order by fname,lname) cnt
from customer a
inner join policy b
on a.cid=b.cid
)
select *
from stats
where cnt > 20;
c. select *
from customer
natural inner join (
select street1, stree2, city, state, zip, typeid, termid
from customer
natural inner join policy
group by street1, stree2, city, state, zip, typeid, termid
having count(distinct concat(fname,lname,dob)) > 20 ) a;
d. with addr as (
select street1, stree2, city, state, zip, typeid, termid, count(*) cnt
from customer a
inner join policy b using (cid)
group by street1, stree2, city, state, zip, typeid, termid
having count(*) > 20
)
select *
from customer
natural inner join addr;
e. (write your own answer)
18. (5 points) Below query is identical to:
select a.*,b.val c
from T1 a
left outer join T2 b
on a.key=b.key and a.val!=b.val
a. with TMP as (
select a.*,b.val c
from T1 a
inner join T2 b
on a.key=b.key
where a.val!=b.val
)
select a.*,b.c
from T1 a
left outer join TMP b
on a.key=b.key;
b. select a.*,b.val c
from T1 a
inner join T2 b
on a.key=b.key and
a.val!=b.val;
c. with TMP as (
select a.*,b.val c
from T1 a
left outer join T2 b
on a.key=b.key
where a.val!=b.val
)
select a.*
from TMP
where a.val!=b.val;
d. All of the above queries are identical.
e. None of the queries are identical to the question.
19. (5 points) Find customers who have more than 10 policies?
a. select cid,count(*)
from policy
group by cid
having count(*) > 10;
b. select a.cid,count(*)
from customer a
inner join policy b using(cid)
group by a.cid
where count(*) > 10;
c. select b.cid,count(*)
from customer a
left outer join policy b
on a.cid=b.cid
group by b.cid
having count(*) > 10;
d. select cid,sum(case when b.cid is not null then 1 else 0 end)
from customer a
left outer join policy b
using (cid)
group by cid
having sum(case when b.cid is not null then 1 else 0 end) >= 10;
e. (write your own answer)
20. (5 points) Find customers who have less than 5 policies.
a. select cid,count(*)
from policy
group by cid
having count(*) < 5;
b. with outerunion as (
select cid,count(*) cnt
from policy
group by cid
)
select *
from outerunion
where cnt < 5;
c. select a.cid,count(*)
from customer a
inner join policy b
using(cid)
group by a.cid
having count(*) < 5;
d. select cid,sum(case when b.cid is not null then 1 else 0 end)
from customer a
left outer join policy b
using (cid)
group by cid
having sum(case when b.cid is not null then 1 else 0 end) < 5;
e. (write your own answer)