Skip to content

Instantly share code, notes, and snippets.

@cobusc
Last active July 9, 2020 10:32
Show Gist options
  • Select an option

  • Save cobusc/4b582d9a6a48fd9273f2a321d34e8bc9 to your computer and use it in GitHub Desktop.

Select an option

Save cobusc/4b582d9a6a48fd9273f2a321d34e8bc9 to your computer and use it in GitHub Desktop.

Revisions

  1. cobusc revised this gist Jul 9, 2020. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion lookups.md
    Original file line number Diff line number Diff line change
    @@ -2,7 +2,7 @@

    ## 1. For very big sets, create a temporary table with an indexed column and do a SELECT with a JOIN.

    ## 2. For large sets, use VALUES. It will create a table like structure which can be JOINed wuth.
    ## 2. For large sets, use VALUES. It will create a table like structure which can be JOINed with.
    ```
    catalogue=> explain select productline_id from productline_tsin_product as ptp, (values (1), (7), (9), (4), (123), (1234)) as lookup where ptp.productline_id = lookup.column1;
    ┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
  2. cobusc revised this gist Jul 9, 2020. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion lookups.md
    Original file line number Diff line number Diff line change
    @@ -54,7 +54,7 @@ SELECT productline_id
    lookup
    WHERE ptp.productline_id = lookup.id;
    cursor.execute(sql, [1, 7, 9, 4, 123, 1234])
    cursor.execute(sql, {"ids": [1, 7, 9, 4, 123, 1234]})
    ```
    ## 4. IN ()
  3. cobusc revised this gist Jul 9, 2020. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion lookups.md
    Original file line number Diff line number Diff line change
    @@ -78,7 +78,7 @@ SELECT productline_id
    """.format(in_clause)
    ```
    I really dislike this. See "= ANY()" for an alternative approach.
    **I really dislike this. See "= ANY()" for an alternative approach.**
    ## 5. = ANY()
    ```
  4. cobusc created this gist Jul 9, 2020.
    105 changes: 105 additions & 0 deletions lookups.md
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,105 @@
    # Different options

    ## 1. For very big sets, create a temporary table with an indexed column and do a SELECT with a JOIN.

    ## 2. For large sets, use VALUES. It will create a table like structure which can be JOINed wuth.
    ```
    catalogue=> explain select productline_id from productline_tsin_product as ptp, (values (1), (7), (9), (4), (123), (1234)) as lookup where ptp.productline_id = lookup.column1;
    ┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
    │ QUERY PLAN │
    ├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
    │ Nested Loop (cost=0.56..39.84 rows=187 width=8) │
    │ -> Values Scan on "*VALUES*" (cost=0.00..0.08 rows=6 width=4) │
    │ -> Index Only Scan using productline_tsin_product_pkey on productline_tsin_product ptp (cost=0.56..6.32 rows=31 width=8) │
    │ Index Cond: (productline_id = "*VALUES*".column1) │
    └──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
    (4 rows)
    ```

    This can be cumbersome to build up though. Psycopg2 has an `execute_values()` function which makes it easy.

    ```python
    sql = """
    SELECT productline_id
    FROM productline_tsin_product AS ptp,
    (VALUES %s) AS lookup
    WHERE ptp.productline_id = lookup.column1
    """

    execute_value(cursor, sql, [1, 7, 9, 4, 123, 1234])
    ```

    ## 3. CTE (WITH-clause)
    ```
    catalogue=> explain with lookup as (select unnest('{1, 7, 9, 4, 123, 1234}'::int[]) AS id) select productline_id from productline_tsin_product as ptp, lookup where ptp.productline_id = lookup.id;
    ┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
    │ QUERY PLAN │
    ├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
    │ Nested Loop (cost=0.56..39.87 rows=187 width=8) │
    │ -> ProjectSet (cost=0.00..0.05 rows=6 width=4) │
    │ -> Result (cost=0.00..0.01 rows=1 width=0) │
    │ -> Index Only Scan using productline_tsin_product_pkey on productline_tsin_product ptp (cost=0.56..6.32 rows=31 width=8) │
    │ Index Cond: (productline_id = (unnest('{1,7,9,4,123,1234}'::integer[]))) │
    └──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
    ```
    I don't think there is any benefit to this option over the others.

    ```python
    sql = """
    WITH lookup AS (
    SELECT UNNEST(%(ids)s) AS id
    )
    SELECT productline_id
    FROM productline_tsin_product AS ptp,
    lookup
    WHERE ptp.productline_id = lookup.id;
    cursor.execute(sql, [1, 7, 9, 4, 123, 1234])
    ```
    ## 4. IN ()
    ```
    catalogue=> explain select productline_id from productline_tsin_product where productline_id in (1,7,9,4,123,1234);
    ┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
    │ QUERY PLAN │
    ├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
    │ Index Only Scan using productline_tsin_product_pkey on productline_tsin_product (cost=0.56..37.91 rows=187 width=8) │
    │ Index Cond: (productline_id = ANY ('{1,7,9,4,123,1234}'::bigint[])) │
    └──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
    (2 rows)
    ```
    There is nothing wrong with this query, but in applications people usually build up the IN clause using something like:
    ```python
    in_clause = ",".join(str(id) for id in [1, 7, 9, 4, 123, 1234])
    sql = """
    SELECT productline_id
    FROM productline_tsin_product
    WHERE productline_id IN ({})
    """.format(in_clause)
    ```
    I really dislike this. See "= ANY()" for an alternative approach.
    ## 5. = ANY()
    ```
    catalogue=> explain select productline_id from productline_tsin_product where productline_id = any('{1, 7, 9, 4, 123, 1234}');
    ┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
    │ QUERY PLAN │
    ├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
    │ Index Only Scan using productline_tsin_product_pkey on productline_tsin_product (cost=0.56..37.91 rows=187 width=8) │
    │ Index Cond: (productline_id = ANY ('{1,7,9,4,123,1234}'::bigint[])) │
    └──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
    (2 rows)
    ```
    This should be equivalent to "IN ()", but it has the benefit of taking an array, so in your application you can pass the array.
    ```python
    sql = """
    SELECT productline_id
    FROM productline_tsin_product
    WHERE productline_id = ANY(%(ids)s);
    """
    cursor.execute(sql, {"ids": [1, 7, 9, 4, 123, 1234]})
    ```