Skip to content

Instantly share code, notes, and snippets.

View noahfkennedy's full-sized avatar

Noah Kennedy noahfkennedy

View GitHub Profile
@noahfkennedy
noahfkennedy / get_table_exists.sql
Created July 19, 2023 21:09
Check if table exists
{% macro get_table_exists(input_table, threshold) %}
{# Try adapter to see if table exists #}
{%- set table_exists = adapter.get_relation(database=var('project_id'),
schema=test_schema,
identifier=test_alias) is not none -%}
{# If table doesn't exist, return FALSE #}
{% if not table_exists %}
{{ return(table_exists) }}
@noahfkennedy
noahfkennedy / generate_schema_name.sql
Created July 19, 2023 20:46
Custom Schema Naming
{% macro generate_schema_name(custom_schema_name, node) -%}
{%- set default_schema = target.schema -%}
{%- if custom_schema_name is none -%}
{{ default_schema }}
{%- else -%}
{#-- dbt defaults to concatenating the custom schema to the target schema --#}
{%- macro clean_date(dt) -%}
{% if dt | length > 10 %}
left({{ dt }}, 10)
{% else %}
{{ dt }}
{% endif %}
{%- endmacro -%}
@noahfkennedy
noahfkennedy / Example_Macro_Usage.sql
Created July 19, 2023 20:15
Example Macro Usage
{{ config( materialized='table',
tags=['example_test']
)
}}
select {{ example_macro(1, 2) }} as example_column
{% macro example_macro(column1, column2) %}
{{column1}} + {{column2}}
{% endmacro %}
@noahfkennedy
noahfkennedy / DataProc_inst.py
Created March 1, 2023 22:57
DataProc PySpark Example Base
from pyspark.sql import SparkSession
import os
import json
# Setting spark session
spark = SparkSession.builder \
.master("yarn") \
.config("spark.sql.legacy.parquet.datetimeRebaseModeInWrite", "CORRECTED") \
.config("spark.sql.legacy.parquet.datetimeRebaseModeInRead", "CORRECTED") \
@noahfkennedy
noahfkennedy / DataProc.yaml
Created March 1, 2023 22:53
DataProc YAML config - Complete
placement:
managedCluster:
clusterName: my-managed-cluster
config:
gceClusterConfig:
zoneUri: us-central1-a
jobs:
- pysparkJob:
fileUris:
mainPythonFileUri: your_pyspark_script.py
@noahfkennedy
noahfkennedy / DataProc_Params.yaml
Last active March 2, 2023 15:35
DataProc YAML Config - Parameters
parameters:
- name: MY_VAR1
fields:
- jobs['STEP_NAME_123'].pysparkJob.properties['spark.executorEnv.MY_VAR1']
- name: MY_VAR2
fields:
- jobs['STEP_NAME_123'].pysparkJob.properties['spark.executorEnv.MY_VAR2']
@noahfkennedy
noahfkennedy / DataProc_Job.yaml
Last active March 2, 2023 15:36
DataProc YAML config - job
jobs:
- pysparkJob:
fileUris:
mainPythonFileUri: your_pyspark_script.py
properties:
spark.executorEnv.MY_VAR1: 'default_value'
spark.executorEnv.MY_VAR2: 'default_value'
stepId: STEP_NAME_123
@noahfkennedy
noahfkennedy / DataProc_Cluster.yaml
Created March 1, 2023 22:45
DataProc YAML config - Cluster
placement:
managedCluster:
clusterName: my-managed-cluster
config:
gceClusterConfig:
zoneUri: us-central1-a