Skip to content

Instantly share code, notes, and snippets.

@mdeland
Last active March 23, 2020 03:33
Show Gist options
  • Select an option

  • Save mdeland/8faeb4e60c43b7c51fad to your computer and use it in GitHub Desktop.

Select an option

Save mdeland/8faeb4e60c43b7c51fad to your computer and use it in GitHub Desktop.

Revisions

  1. mdeland revised this gist Jan 25, 2016. 6 changed files with 6 additions and 6 deletions.
    2 changes: 1 addition & 1 deletion basic_query.csv
    Original file line number Diff line number Diff line change
    @@ -1,4 +1,4 @@
    date, num_customers, distinct_customers, total_bananas, total_revenue, revenue_per_sale
    2016-01-01, 345, 287, 564, 3689, 6.54
    2016-01-02, 364, 299, 582, 4080, 7.01
    ...
    ..., ..., ..., ..., ..., ...
    2 changes: 1 addition & 1 deletion histogram.csv
    Original file line number Diff line number Diff line change
    @@ -2,4 +2,4 @@ bucket_floor, count
    0, 1054
    5, 465
    10, 233
    ...
    ..., ...
    2 changes: 1 addition & 1 deletion histogram_label.csv
    Original file line number Diff line number Diff line change
    @@ -2,4 +2,4 @@ bucket_floor, bucket_name, count
    0, 0 to 5, 1054
    5, 5 to 10, 465
    10, 10 to 15, 233
    ...
    ..., ..., ...
    2 changes: 1 addition & 1 deletion joint.csv
    Original file line number Diff line number Diff line change
    @@ -3,4 +3,4 @@ wait_time_bucket, avg_revenue
    10, 8.43
    20, 9.01
    30, 7.50
    ...
    ..., ...
    2 changes: 1 addition & 1 deletion percentiles.csv
    Original file line number Diff line number Diff line change
    @@ -1,4 +1,4 @@
    date, percentile_25, percentile_50, percentile_75, avg
    2016-01-01, 18, 37, 75, 66
    2016-01-02, 19, 35, 77, 64
    ...
    ..., ..., ..., ..., ...
    2 changes: 1 addition & 1 deletion percentiles2.csv
    Original file line number Diff line number Diff line change
    @@ -1,4 +1,4 @@
    date, median
    2016-01-01, 37
    2016-01-02, 35
    ...
    ..., ...
  2. mdeland created this gist Jan 25, 2016.
    4 changes: 4 additions & 0 deletions basic_query.csv
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,4 @@
    date, num_customers, distinct_customers, total_bananas, total_revenue, revenue_per_sale
    2016-01-01, 345, 287, 564, 3689, 6.54
    2016-01-02, 364, 299, 582, 4080, 7.01
    ...
    10 changes: 10 additions & 0 deletions basic_query.sql
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,10 @@
    SELECT
    date,
    count(*) as num_customers,
    count(distinct user_id) as distinct_customers,
    sum(bananas_sold) as total_bananas,
    sum(revenue) as total_revenue,
    avg(revenue) as revenue_per_sale
    FROM banana_sales
    GROUP BY date
    ORDER BY date;
    5 changes: 5 additions & 0 deletions histogram.csv
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,5 @@
    bucket_floor, count
    0, 1054
    5, 465
    10, 233
    ...
    6 changes: 6 additions & 0 deletions histogram.sql
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,6 @@
    select
    floor(revenue/5.00)*5 as bucket_floor,
    count(*) as count
    from banana_sales
    group by 1
    order by 1;
    5 changes: 5 additions & 0 deletions histogram_label.csv
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,5 @@
    bucket_floor, bucket_name, count
    0, 0 to 5, 1054
    5, 5 to 10, 465
    10, 10 to 15, 233
    ...
    12 changes: 12 additions & 0 deletions histogram_label.sql
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,12 @@
    select
    bucket_floor,
    CONCAT(bucket_floor, ' to ', bucket_ceiling) as bucket_name,
    count(*) as count
    from (
    select
    floor(revenue/5.00)*5 as bucket_floor,
    floor(revenue/5.00)*5 + 5 as bucket_ceiling
    from web_sessions_table
    ) a
    group by 1, 2
    order by 1;
    6 changes: 6 additions & 0 deletions histogram_naive.sql
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,6 @@
    select
    revenue,
    count(*)
    from banana_sales
    group by revenue
    order by revenue;
    6 changes: 6 additions & 0 deletions joint.csv
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,6 @@
    wait_time_bucket, avg_revenue
    0, 10.87
    10, 8.43
    20, 9.01
    30, 7.50
    ...
    6 changes: 6 additions & 0 deletions joint.sql
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,6 @@
    select
    floor(wait_time/10.00)*10 as wait_time_bucket,
    avg(revenue) as avg_revenue
    from banana_sales
    group by 1
    order by 1;
    4 changes: 4 additions & 0 deletions joint2.sql
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,4 @@
    select
    corr(wait_time, revenue) as correlation,
    covar_samp(wait_time, revenue) as covariance
    from banana_sales;
    4 changes: 4 additions & 0 deletions percentiles.csv
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,4 @@
    date, percentile_25, percentile_50, percentile_75, avg
    2016-01-01, 18, 37, 75, 66
    2016-01-02, 19, 35, 77, 64
    ...
    12 changes: 12 additions & 0 deletions percentiles.sql
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,12 @@
    SELECT
    date,
    percentile_cont (0.25) WITHIN GROUP
    (ORDER BY wait_time ASC) OVER(PARTITION BY date) as percentile_25,
    percentile_cont (0.50) WITHIN GROUP
    (ORDER BY wait_time ASC) OVER(PARTITION BY date) as percentile_50,
    percentile_cont (0.75) WITHIN GROUP
    (ORDER BY wait_time ASC) OVER(PARTITION BY date) as percentile_75,
    avg(wait_time) as avg -- for comparison
    FROM banana_sales
    GROUP BY date
    ORDER BY date;
    4 changes: 4 additions & 0 deletions percentiles2.csv
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,4 @@
    date, median
    2016-01-01, 37
    2016-01-02, 35
    ...
    25 changes: 25 additions & 0 deletions percentiles2.sql
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,25 @@
    SELECT
    t1.date,
    t1.wait_time as median
    FROM (
    SELECT
    date,
    wait_time,
    ROW_NUMBER() OVER(ORDER BY wait_time PARTITION BY date) as row_num
    FROM banana_sales
    ) t
    JOIN (
    SELECT
    date,
    count(*) as total
    FROM banana_sales
    GROUP BY date
    ) t2
    ON
    t1.date = t2.date
    -- for simplicity, we take a simple solution when the list has an even length, to just choose one value
    WHERE t1.row_num =
    CASE when t2.total % 2 = 0
    THEN t2.total / 2
    ELSE (t2.total + 1) / 2
    END;