`Jennifer Zhao` | `SQL Problems Collection` | `2018-02-22 22:14:27`

<!-- MarkdownTOC -->

- [Comments I](#comments-i)
  - [Calculate the average comments for the users with >= 2 posts, and `each` post has comments greater or equal to 40](#calculate-the-average-comments-for-the-users-with--2-posts-and-each-post-has-comments-greater-or-equal-to-40)
- [Comments II](#comments-ii)
  - [What is the distribution of comments?](#what-is-the-distribution-of-comments)
  - [Now what if content_type becomes {comment, video, photo, article}，what is the comment distribution for each content type?](#now-what-if-contenttype-becomes-comment-video-photo-article，what-is-the-comment-distribution-for-each-content-type)
- [Messages](#messages)
  - [What can we get some insights from this table?](#what-can-we-get-some-insights-from-this-table)
  - [Write a query about the distribution of number of conversations among users on someday. Before we run any SQL, what is your gut sense that what the distribution will look like? Why?](#write-a-query-about-the-distribution-of-number-of-conversations-among-users-on-someday-before-we-run-any-sql-what-is-your-gut-sense-that-what-the-distribution-will-look-like-why)
  - [Write a query that we can find the top partner who sends the most number of messages to each user. And then add a outer query to calculate the following ratio:](#write-a-query-that-we-can-find-the-top-partner-who-sends-the-most-number-of-messages-to-each-user-and-then-add-a-outer-query-to-calculate-the-following-ratio)
- [Friends](#friends)
  - [Calculate the overall friend accept rate within a time range.](#calculate-the-overall-friend-accept-rate-within-a-time-range)
  - [Who has most friends?](#who-has-most-friends)
- [Friend II](#friend-ii)
  - [Generate friend request acceptance rate](#generate-friend-request-acceptance-rate)
  - [Generate the friend request acceptance rate for people who accept within 24 hours.](#generate-the-friend-request-acceptance-rate-for-people-who-accept-within-24-hours)
- [CTR](#ctr)
- [CTR II](#ctr-ii)
  - [Assuming ctr is defined as total number clicks / total number of impressions \(not counting each unique user’s action\)](#assuming-ctr-is-defined-as-total-number-clicks--total-number-of-impressions-not-counting-each-unique-user’s-action)
- [User Status](#user-status)
  - [Assume two tables, Tracking {user, state} and Day {user}](#assume-two-tables-tracking-user-state-and-day-user)
- [Notifications](#notifications)
  - [How many users turned feature on till now?](#how-many-users-turned-feature-on-till-now)
  - [How many users have ever turned the feature on?](#how-many-users-have-ever-turned-the-feature-on)
  - [In a table that tracks the status of every user every day, how would you add today’s data to it?](#in-a-table-that-tracks-the-status-of-every-user-every-day-how-would-you-add-today’s-data-to-it)
- [Recommendations](#recommendations)
  - [Write a SQL that makes recommendations using the pages that your friends liked. It should not recommend pages you already liked.](#write-a-sql-that-makes-recommendations-using-the-pages-that-your-friends-liked-it-should-not-recommend-pages-you-already-liked)
- [Phone Logins](#phone-logins)
  - [How many login_confirmation was sent by different carrier in different country yesterday?](#how-many-loginconfirmation-was-sent-by-different-carrier-in-different-country-yesterday)
  - [How to analyze why the number of logins dropped.](#how-to-analyze-why-the-number-of-logins-dropped)
- [Article Views](#article-views)
  - [How many article authors have never viewed their own article?](#how-many-article-authors-have-never-viewed-their-own-article)
  - [How many members viewed more than one articles on 2017-08-01](#how-many-members-viewed-more-than-one-articles-on-2017-08-01)
- [Companies](#companies)
  - [count members who ever moved from Microsoft to Google](#count-members-who-ever-moved-from-microsoft-to-google)
  - [count members who directly moved from Microsoft to Google? \(Microsoft -- Linkedin -- Google doesn't count\)](#count-members-who-directly-moved-from-microsoft-to-google-microsoft----linkedin----google-doesnt-count)
- [Continents](#continents)
  - [find the country with largest population in each continent, with strictly output: continent, country, population.](#find-the-country-with-largest-population-in-each-continent-with-strictly-output-continent-country-population)
  - [now for each continent, find the country with largest % of population in given continent. write SQL, then write Python.](#now-for-each-continent-find-the-country-with-largest-%-of-population-in-given-continent-write-sql-then-write-python)
- [Projects](#projects)
  - [Find the number of unique employees per project per month?](#find-the-number-of-unique-employees-per-project-per-month)
  - [For each employee, a list of projects he/she works on Write a script/function/else that reads data from CSV file and creates a data structure that stores, for each project, a list of employees who work on it.](#for-each-employee-a-list-of-projects-heshe-works-on-write-a-scriptfunctionelse-that-reads-data-from-csv-file-and-creates-a-data-structure-that-stores-for-each-project-a-list-of-employees-who-work-on-it)
- [Pivot/ Unpivot](#pivot-unpivot)
  - [unpivot](#unpivot)
  - [pivot the table above](#pivot-the-table-above)
- [Ads](#ads)
  - [The fraction of advertiser has at least 1 conversion?](#the-fraction-of-advertiser-has-at-least-1-conversion)
  - [What metrics would you show to advertisers \(ROI\)?](#what-metrics-would-you-show-to-advertisers-roi)
- [Student](#student)
  - [What was the overall attendance rate for the school district yesterday?](#what-was-the-overall-attendance-rate-for-the-school-district-yesterday)
  - [Which grade level currently has the most students in this school district?](#which-grade-level-currently-has-the-most-students-in-this-school-district)

<!-- /MarkdownTOC -->


### Comments I

| name | posts  | comments   |
|------|--------|------------|
| u1     | page1  | 90       |
| u1     | page2  | 50       |
| u1     | page3  | 40       |
| u2     | page2  | 55       |
| u2     | page4  | 45       |
| u4     | page4  | 30       |
| u4     | page3  | 40       |
| u3     | page2  | 100      |

``` sql
CREATE TABLE comments (
    name            varchar(80),
    posts           varchar(80),
    comments         int
);

INSERT INTO comments VALUES ('u1', 'page1', '90');
INSERT INTO comments VALUES ('u1', 'page2', '50');
INSERT INTO comments VALUES ('u1', 'page3', '40');
INSERT INTO comments VALUES ('u2', 'page2', '55');
INSERT INTO comments VALUES ('u2', 'page4', '45');
INSERT INTO comments VALUES ('u4', 'page4', '30');
INSERT INTO comments VALUES ('u4', 'page3', '40');
INSERT INTO comments VALUES ('u3', 'page2', '100');

-- or you can do 

copy comments from '/Users/jennifer/dev/sql_data_exercise/comments.csv' CSV HEADER
```

#### Calculate the average comments for the users with >= 2 posts, and `each` post has comments greater or equal to 40

``` sql
-- solution
select a.name, round(avg(a.comments),2) from comments a
inner join

-- get users who has more than two posts and each posts with more than 40 comments
(select name from comments
where comments >= 40
group by name
having count(distinct posts) >= 2) b

on a.name = b.name
group by a.name;
```

### Comments II

table: content_id | content_type (comment/ post) | target_id

If it is comment，target_id is the userid who posts it.

If it is post, then target_id is NULL.


#### What is the distribution of comments?


``` sql
select cnt, count(cnt) as freq
from
    (select content_id, count(target_id) cnt 
    from table 
     where content_type = 'comment'
    group by content_id) a 
group by cnt;
```

#### Now what if content_type becomes {comment, video, photo, article}，what is the comment distribution for each content type?

``` sql

select content_type, cnt, count(cnt) as freq
from (
    select a.content_id, a.content_type, count(b.target_id) cnt
    from table a
    group by a.content_id, a.content_type) tmp
group by content_type
order by content_type, cnt;

```

### Messages

table: date | u1 | u2 | n_msg

n_msg: the number of messsages between one unique user pair at someday.


#### What can we get some insights from this table?

user activities, represent closeness.

#### Write a query about the distribution of number of conversations among users on someday. Before we run any SQL, what is your gut sense that what the distribution will look like? Why?

``` sql
select conversations, count(u1) freq
from
(select u1, count(distinct u2) conversations from table1
where n_msg > 0 and date = SOMEDAY
group by u1) a
group by conversations
order by conversations
```

very skew data on left tail

#### Write a query that we can find the top partner who sends the most number of messages to each user. And then add a outer query to calculate the following ratio:

sum(n_msg_with_top_partners) / sum(n_msg_with_all_contacts)


```
select cast(sum(sum_top) as float)/(select sum(n_msg) from table) as fraction

from (
-- top partners
select u1, max(sum_n_msg) sum_top from 
(select u1, u2, sum(n_msg) sum_n_msg from table1
group by u1, u2) tmp
group by u1
) tmp2

```

### Friends

table friending: action_id, target_id, action（accept, request, unfriend), date, time

#### Calculate the overall friend accept rate within a time range.

``` sql

select sum(accept)/cast(sum(request) as float) from

-- accept

(select a.action_id, count(a.target_id) accept from table1 a, table2 b
where a.target_id = b.action_id
      and a.action ='accept'
      and b.action ='request'
) a,
-- request

(select action_id, count(target_id) request from table1
where action = 'request') b;
```

#### Who has most friends?

``` sql
select user_id, sum(cnt) as n_friend
from (
    (select action_id as user_id,
            sum(case when accept = 'accept' then 1
            when accept = 'unfriend' then -1
            else 0 end) as cnt
    from table)
    union all
    (select target_id as user_id, 
            sum(case when accept = 'accept' then 1
            when accept = 'unfriend' then -1
            else 0 end) as cnt
    from table)
) a
group by user_id
order by n_friend desc
limit 1
```

### Friend II

Table: friending (date | time | action | actor_id | target_id)

action = {‘send_request’,‘accept_request’}

#### Generate friend request acceptance rate

``` sql

select cast(sum(accept) as float)/sum(request) from
-- accept
(
select a.actor_id, count(distinct b.target_id) accept from table1 a, table1 b
where a.target_id = b.actor_id
     and a.action = 'accept_request'
     and b.action = 'send_request'
group by a.actor_id) a,

-- request 
(
select actor_id, count(distinct target_id) request from table1
where action = 'send_request'
group by actor_id) b;
```

#### Generate the friend request acceptance rate for people who accept within 24 hours.

``` sql
select cast(sum(accept) as float)/sum(request) from
-- accept
(
select a.date, a.actor_id, count(distinct b.target_id) accept from table1 a, table1 b
where a.target_id = b.actor_id
     and a.action = 'accept_request'
     and b.action = 'send_request'
     and datepart('day', a.date-b.date) <= 1
group by a.date, a.actor_id) a,

-- request 
(
select date, actor_id, count(distinct target_id) request from table1
where action = 'send_request'
group by date, actor_id) b

where datepart('day', a.date-b.date) <= 1;
```

### CTR

table1: time, user_id, app, event（impression，click, null).

If the user saw it and clicked it, then the event is click. If the user saw it but did not click, then the event is impression. The event is null, which means the user did not see it.

 #### Get the click through rate.
 
 ``` sql
 select a.app, COALESCE(b.clicks,0)/cast(a.impression,float) from
 -- impression
 (select app, count(user_id) impression
 from table1
 where event is not null
 group by app) a,
 -- clicks
 left join
 (select app, count(user_id) clicks
 from table1
 where event = 'click'
 group by app) b
 
 on a.app = b.app
 
 ```
 
 #### What if CTR is over 100%? Please justify any reason that could cause this problem.
 
 For each impression, we might have more than one click. So we need choose the right metric to represent the click through rate.

Here we want to clarify the difference between CTR (click through rate) and CTP (click through probability).

When calculating we need to make sure click is from the users who viewed it.
 
 ``` sql
select a.app, COALESCE(a.clicks,0)/cast(b.impression as float) from
-- clicks
(select a.app, count(distinct a.user_id) clicks from table1 a
inner join table1 b
on a.app = b.app and a.user_id = b.user_id
where a.event = 'click' and b.event = 'impression'
group by a.app) a

left join
 -- impression
 (select app, count(user_id) impression
 from table1
 where event is not null
 group by app) b

on a.app = b.app
 ```

### CTR II

#### Assuming ctr is defined as total number clicks / total number of impressions (not counting each unique user’s action)

``` sql
select
    appid,
    total_click_ct / total_imp_ct as ctr
from (
    select
        appid,
        count(distinct case when flag = 'imp' then 1 else 0 end) as total_imp_ct,
        count(distinct case when flag = 'click' then 1 else 0 end) as total_click_ct,
    from table
    where ts > x and ts < y
    group by appid) table2
order by ctr desc;

```

### User Status

Given a table that each day shows who was active in the system and a table that tracks ongoing user status, write a procedure that will take each day’s active table and pass it into the ongoing daily tracking table.

Possible states are:

user stayed (yesterday yes, today yes)

user churned (yesterday yes, today no)

user revived (yesterday no, today yes)

user new (yesterday null, today yes)

Note: you’ll want to spot and account for the undefined state.

#### Assume two tables, Tracking {user, state} and Day {user}

check churn | new | stayed | revived

``` sql
with tmp as (
select case when Tracking.user is not null then Tracking.user else Day.user end as user,
       case when Day.user is null then "churned"
            when Tracking.state is null then "new"
            when Tracking.state <> "churned" then "stayed"
            when Tracking.state = "churned" then "revived"
            else "check" end as state
from Tracking 
full outer join Day
on Tracking.user = Day.user);
```

### Notifications

There is a table that tracks every time a user turns a feature on or off, with columns user_id, action (“on” or “off), date, and time.

user_id | action | date | time

#### How many users turned feature on till now?

``` sql
select count(distinct user_id) from 
(select user_id, sum(case when action is 'on' then 1
                     when action is 'off' then -1
                     else 0 end as status)
from table1
group by user_id) tmp
where status > 0;

```

#### How many users have ever turned the feature on?

``` sql
select count(distinct user_id) from table1
where action = 'on';
```

#### In a table that tracks the status of every user every day, how would you add today’s data to it?

- historical_data: user_id, action (unique pair)

- today: user_id, action

```
with tmp as (
select user_id,
case when on_cnt - off_cnt = 1 then 'on' else 'off' end as action
from 
    (select user_id, 
    sum(case when action='on' then 1 else 0) on_cnt,
    sum(case when action='off' then 1 else 0) off_cnt
    from 
        (select user_id, action from historical_data
        union all
        select user_id, action from today) a
    group by user_id) b);
```

### Recommendations

two tables

friends_table: user_id | friend_id

pages_table: user_id | post_id

#### Write a SQL that makes recommendations using the pages that your friends liked. It should not recommend pages you already liked.

``` sql
select 
    a.user_id, 
    b.post_id
from friends_table a
join pages_table b
on a.friends_id = b.user_id
where (a.user_id, b.post_id) not in (select user_id, post_id from pages_table)
order by a.user_id, b.post_id
```

### Phone Logins

attempts: date | country | carrier | phone_no | type (‘login_confirmation’| ‘notification’)

logins: date | phone_no


#### How many login_confirmation was sent by different carrier in different country yesterday?

``` sql
select country, carrier, count(phone_no)
from attempts
where type = 'login_confirmation'
      and datepart('day', now() - date) <= 1
group by country, carrier;
```

#### How to analyze why the number of logins dropped. 

plot it and see how carrier perform in each country by date

``` sql
select 
    x.date, 
    x.country, 
    x.carrier, 
    round(y.total_logins/x.total_attempts, 2) as login_rate
from
    (select date, country, carrier, count(distinct phone_no) total_attempts 
    from attempts
    where type = 'login_confirmation'
    group by date, country, carrier
    ) x
join
    (select a.date, a.country, a.carrier, count(distinct a.phone_no) total_logins 
    from attempts a
    join logins b
    on a.date = b.date and a.phone_no = b.phone_no
    where a.type = 'login_confirmation'
    group by a.date, a.country, a.carrier
    ) y
on x.date=y.date 
and x.country=y.country 
and x.carrier=y.carrier
```

### Article Views

table name: article_views

| date | viewer_id | article_id | author_id | 
|----|----|----|----|                   
|2017-08-01|  123 | 456| 789|                
|2017-08-02|  432 | 543| 654|
|2017-08-01|  789 | 456| 789|
|2017-08-03|  567 | 780| 432|
|2017-08-03|  567 | 780| 432|
|2017-08-01|789|457   |789


``` sql
CREATE TABLE article_views (
    date            timestamp,
    viewer_id       int,
    article_id      int,
    author_id       int
);
 
INSERT INTO article_views VALUES ('2017-08-01',123,	456	,789);
INSERT INTO article_views VALUES ('2017-08-02',432	,543,	654);
INSERT INTO article_views VALUES ('2017-08-01',789,	456	,789);
INSERT INTO article_views VALUES ('2017-08-03',567,	780,	432);
INSERT INTO article_views VALUES ('2017-08-01',789, 457    ,789);
```

#### How many article authors have never viewed their own article? 

``` sql

-- solution
select count(distinct a.author_id) from article_views a
where a.author_id not in 
(select a.author_id from article_views a
where a.author_id = a.viewer_id);
```

#### How many members viewed more than one articles on 2017-08-01

``` sql
-- solution
select viewer_id from article_views
where date = TIMESTAMP '2017-08-01'
group by viewer_id
having count(article_id) > 1;
```

### Companies

table
member_id, company_name, year_start

``` sql
CREATE TABLE companies (
    member_id            int,
    company_name       varchar(80),
    year_start      int
);

INSERT INTO companies VALUES (1,'Google', 1990);
INSERT INTO companies VALUES (1,'Microsoft', 2000);
INSERT INTO companies VALUES (2,'Microsoft', 2000);
INSERT INTO companies VALUES (2,'Google', 2001);
INSERT INTO companies VALUES (3,'Microsoft', 1997);
INSERT INTO companies VALUES (3,'Google', 1998);
INSERT INTO companies VALUES (4,'Microsoft', 1997);
INSERT INTO companies VALUES (4,'LinkedIn', 1998);
INSERT INTO companies VALUES (4,'Google', 2000);


```

#### count members who ever moved from Microsoft to Google

``` sql

select count(distinct a.member_id) from companies a, companies b
where a.member_id = b.member_id
      and a.year_start < b.year_start
      and a.company_name = 'Microsoft'
      and b.company_name = 'Google';
```

#### count members who directly moved from Microsoft to Google? (Microsoft -- Linkedin -- Google doesn't count)

``` sql

-- solution 1
select count(distinct a.member_id) from companies a, companies b
where a.member_id = b.member_id
      and a.member_id = b.member_id
      and a.year_start < b.year_start
      and a.company_name = 'Microsoft'
      and b.company_name = 'Google'
      and a.member_id not in
                        (select distinct a.member_id 
                         from companies a, companies b, companies c
                         where a.member_id = b.member_id
                              and a.member_id = b.member_id
                              and a.year_start < b.year_start
                              and c.year_start > a.year_start
                              and c.year_start < b.year_start
                              and a.company_name = 'Microsoft'
                              and b.company_name = 'Google');

-- solution 2
select count(distinct member_id) from (
  select member_id, 
         company_name as current_company_name, 
         year_start, 
         lag(company_name) over (partition by member_id order by          year_start asc) as previous_company_name
  from companies
  ) a
where previous_company_name = 'Microsoft' and current_company_name = 'Google';
```


Note:
    
    lag(value any [, offset integer [, default any ]])	same type as value	returns value evaluated at the row that is offset rows before the current row within the partition; if there is no such row, instead return default. Both offset and default are evaluated with respect to the current row. If omitted, offset defaults to 1 and default to null

### Continents

a table with: continent, country, population

#### find the country with largest population in each continent, with strictly output: continent, country, population. 

Consider corner case that two country have same largest population in the same continent

``` sql
CREATE TABLE continents (
    continent          varchar(80),
    country            varchar(80),
    population         int
);

INSERT INTO continents VALUES ('Asia','China', 100);
INSERT INTO continents VALUES ('Asia','Inida', 100);
INSERT INTO continents VALUES ('Africa','South Africa', 50);
INSERT INTO continents VALUES ('Africa','Egypt', 20);
INSERT INTO continents VALUES ('North America','USA', 50);
INSERT INTO continents VALUES ('North America','Canada', 50);
```

``` sql
-- solution 1
select a.country, b.continent, a.population from continents a
inner join (
    select continent, max(population) max_p from continents
    group by continent
) b
on a.continent = b.continent
and a.population = b.max_p;

-- solution 2

select a.country, a.continent, a.population from continents a
inner join 
(select country, rank() over (partition by continent order by population desc ) as rank
from continents) b
on a.country = b.country
where rank = 1;
```

Note:

    Window functions are permitted only in the SELECT list and the ORDER BY clause of the query. They are forbidden elsewhere, such as in GROUP BY, HAVING and WHERE clauses. This is because they logically execute after the processing of those clauses. Also, window functions execute after regular aggregate functions. This means it is valid to include an aggregate function call in the arguments of a window function, but not vice versa.

#### now for each continent, find the country with largest % of population in given continent. write SQL, then write Python.

``` sql

select a.country, a.continent, round(a.population/c.total*100) ratio_percent from continents a
inner join 
        (select country, rank() over (partition by continent order by population desc ) as rank
        from continents) b
on a.country = b.country
inner join 
        (select continent, CAST(sum(population) as float) total from continents
         group by continent) c
on a.continent = c.continent
where rank = 1 and a.continent = GIVEN_VALUE;

```

### Projects

SQL Given tables:

employees(id, unixname, team, role, days_since_started)

projects(id, name,….)

commits(id, file_path, proj_id, auth_id, timestamp)

#### Find the number of unique employees per project per month?

``` sql
select 
    p.id, 
    extract(month from c.timstamp) as month,
    count(distinct e.id)
from commits c
join employees e
on c.auth_id = e.id
join projects p
on c.proj_id = p.id
group by p.id, month

```

#### For each employee, a list of projects he/she works on Write a script/function/else that reads data from CSV file and creates a data structure that stores, for each project, a list of employees who work on it.

```
john_doe, android, ios, infra 
bob_law, is, backend 
jane_doe, frontend 
..., ..., ...
```
``` python
import collections
projects = collections.defaultdict(list)

with open("test.csv") as df:
    for line in df:
        proj = line.split(",")
        name = proj[0]
        for p in proj[1:]:
            projects[p].append(name)
```

### Pivot/ Unpivot

|date     |qty_prod_a  |qty_prod_b  |qty_prod_c|
|---------|------------|------------|---------|                   
|1/1/2013 |100         |200         |300      |
|1/2/2013 |101         |0           |301      |
|1/3/2013 |102         |202         |302      |

``` sql
CREATE TABLE products (
    date          timestamp,
    qty_prod_a    int,
    qty_prod_b    int,
    qty_prod_c    int
);

INSERT INTO products VALUES ('1/1/2013', 100,200,300);
INSERT INTO products VALUES ('1/2/2013', 101,0,301);
INSERT INTO products VALUES ('1/3/2013', 102,202,302);
```

#### unpivot

- you have a table t1 with the quantity of product A, B, and C sold per day, as shown above
- there are only 3 possible products in the table: A, B, and C
- write SQL code to reformat the data as shown below
- the resulting data should be in 3 columns: {date, product name, quantity sold}

``` sql
select * from (
select date, 'a' as product, qty_prod_a as quantity_sold from products
union 
select date, 'b' as product, qty_prod_b as quantity_sold from products
union
select date, 'c' as product, qty_prod_c as quantity_sold from products
    ) tmp
order by date, product;
```

#### pivot the table above

``` sql
create table products1 as (
select * from (
select date, 'a' as product, qty_prod_a as quantity_sold from products
union 
select date, 'b' as product, qty_prod_b as quantity_sold from products
union
select date, 'c' as product, qty_prod_c as quantity_sold from products
    ) tmp
order by date, product);

SELECT
  date,
  SUM(CASE product WHEN 'a' THEN quantity_sold ELSE 0 END) AS "product.A",
  SUM(CASE product WHEN 'b' THEN quantity_sold ELSE 0 END) AS "product.B",
  SUM(CASE product WHEN 'c' THEN quantity_sold ELSE 0 END) AS "product.C"
FROM products1
GROUP BY date;
```

### Ads

two tables

- adv_info: advertiser_id | ad_id | spend: The Advertiser pay for this ad

- ad_info: ad_id | user_id | price: The user spend through this ad (Assume all prices in this column >0)


``` sql

CREATE TABLE adv_info (
    advertiser_id    int,
    ad_id    int,
    spend    float
);

INSERT INTO adv_info VALUES (10, 200,300);
INSERT INTO adv_info VALUES (11, 100,1000);
INSERT INTO adv_info VALUES (13, 400,3000);
INSERT INTO adv_info VALUES (14, 500,5000);

CREATE TABLE ad_info (
    ad_id    int,
    user_id    int,
    price    float
);

INSERT INTO ad_info VALUES (200, 4,30);
INSERT INTO ad_info VALUES (100, 4,100);
INSERT INTO ad_info VALUES (400, 4,300);
INSERT INTO ad_info VALUES (200, 10,31);
INSERT INTO ad_info VALUES (100, 10,110);
INSERT INTO ad_info VALUES (400, 10,310);
```

#### The fraction of advertiser has at least 1 conversion?

``` sql
select cast(cnt_1 as float)/total conversion_ratio from
(select count(distinct a.advertiser_id) cnt_1  from adv_info a
inner join
    (select ad_id from ad_info) b
on a.ad_id = b.ad_id) tmp1,

(select count(distinct a.advertiser_id) total  from adv_info a) tmp2;
```

#### What metrics would you show to advertisers (ROI)?

``` sql

select a.advertiser_id, a.ad_id, b.revenue/a.spending roi from
(select advertiser_id , ad_id, sum(spend) spending from adv_info
group by advertiser_id, ad_id) a,

(select ad_id, sum(price) revenue from ad_info
group by ad_id) b
where a.ad_id = b.ad_id;

```

### Student

- attendance: date | student_id | attendance

- student: student_id | school_id | grade_level | date_of_birth | hometown

``` sql
CREATE TABLE attendance (
    date    timestamp,
    student_id    int,
    attendance    int -- 0, 1
);

INSERT INTO attendance VALUES ('2018/01/02', 2,1);
INSERT INTO attendance VALUES ('2018/01/02', 1,1);
INSERT INTO attendance VALUES ('2018/01/02', 4,1);
INSERT INTO attendance VALUES ('2018/02/02', 5,1);
INSERT INTO attendance VALUES ('2018/02/22', 1,1);
INSERT INTO attendance VALUES ('2018/02/22', 4,1);
INSERT INTO attendance VALUES ('2018/02/22', 5,1);


CREATE TABLE student (
    student_id    int,
    school_id    int,
    grade_level int,
    date_of_birth timestamp,
    hometown varchar(80)        
);

INSERT INTO student VALUES (1,100, 4 ,'1994/01/02','san francisco');
INSERT INTO student VALUES (2,100, 4 ,'1994/01/02','san francisco');
INSERT INTO student VALUES (3,100, 4 ,'1994/01/02','san francisco');
INSERT INTO student VALUES (4,100, 4 ,'1994/01/02','san francisco');
```

#### What was the overall attendance rate for the school district yesterday?

```
select cast(count(distinct student_id) as float)/(select count(distinct student_id) from student) att_rate from attendance
where date_part('day', now() - date) <= 1
and attendance = 1;
```

#### Which grade level currently has the most students in this school district?
 
 ``` sql
 with tmp as (select grade_level, count(distinct student_id) cnt from student
 group by grade_level
 order by cnt desc)
 
select grade_level from tmp
where cnt = (select max(cnt) from tmp);
 ```

 #### Which school had the average highest attendance rate? the lowest?
 
 ``` sql
 -- take avarage
 -- daily attendance per school
 -- number of students per school
 
 
 select c.school_id, avg(att_rate_per_school_per_day) as avg_rate
from 
    (select a.date, a.school_id, a.att_num/b.total_num as att_rate_per_school_per_day
    from 
        -- date, school, att_num
        (select t1.date, t2.school_id, cast(count(distinct t1.student_id) as float) as att_num 
        from attendance t1
        right join student t2 
        on t1.student_id = t2.student_id 
        where t1.attendance = 1
        group by t1.date, t2.school_id ) a 
    join 
        -- school, total_num
        (select school_id, count(distinct student_id) as total_num
        from student
        group by school_id) b 
    on a.school_id = b.school_id) c 
group by c.school_id 
order by avg_rate desc;
 ```