Ingested to the example flows by GenerateFlowFile:
ID, CITY_NAME, ZIP_CD, STATE_CD
001, CITY_A, 1111, AA
002, CITY_B, 2222, BB
003, CITY_C, 3333, CC
| repos: | |
| - repo: https://github.com/pre-commit/pre-commit-hooks | |
| rev: v3.2.0 | |
| hooks: | |
| - id: trailing-whitespace | |
| - id: mixed-line-ending | |
| - id: check-added-large-files | |
| args: ['--maxkb=1000'] | |
| - id: end-of-file-fixer | |
| - id: requirements-txt-fixer | 
| #!/bin/bash | |
| # Minimum TODOs on a per job basis: | |
| # 1. define name, application jar path, main class, queue and log4j-yarn.properties path | |
| # 2. remove properties not applicable to your Spark version (Spark 1.x vs. Spark 2.x) | |
| # 3. tweak num_executors, executor_memory (+ overhead), and backpressure settings | |
| # the two most important settings: | |
| num_executors=6 | |
| executor_memory=3g |