Skip to content

Instantly share code, notes, and snippets.

@OneCricketeer
Last active June 15, 2020 16:17
Show Gist options
  • Select an option

  • Save OneCricketeer/a12b4d9b26f0f2df4bee10cdc9c16d5d to your computer and use it in GitHub Desktop.

Select an option

Save OneCricketeer/a12b4d9b26f0f2df4bee10cdc9c16d5d to your computer and use it in GitHub Desktop.

Revisions

  1. OneCricketeer renamed this gist Dec 7, 2018. 1 changed file with 0 additions and 0 deletions.
    File renamed without changes.
  2. OneCricketeer renamed this gist Aug 29, 2018. 1 changed file with 0 additions and 0 deletions.
    File renamed without changes.
  3. OneCricketeer created this gist Aug 29, 2018.
    56 changes: 56 additions & 0 deletions schema-backup.sh
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,56 @@
    #!/usr/bin/env bash
    ######################################
    # Backup Schema Registry
    #
    # To Restore:
    # 0. Download if remotely stored.
    # 1. Extract : `tar -xjvf schemas.tar.bz2`
    #
    # 2. Inspect Logs & Errors : `cat schemas-err.txt`
    # 3. Inspect Schemas : `tail -n50 schemas.txt`
    # 4. Check how many schemas: `wc -l schemas.txt`
    #
    # 5. Load : `kafka-console-producer --broker-list $KAFKA --topic _schemas --property parse.key=true < schemas.txt`
    #######################################

    set -eu -o pipefail

    if [ $# -lt 2 ]; then
    echo "backup-schemas <bootstrap-servers> <timeout-ms>"
    exit 1
    fi

    # Confluent version compatibile with our brokers
    CP_VERSION=3.3.2

    SCHEMAS_TOPIC=_schemas
    STDOUT=schemas.txt
    STDERR=schemas-err.txt

    TIMEOUT_MS=$2
    NOW=$(date +"%Y%m%d_%H%M")

    echo "==> [${NOW}] Consuming ${SCHEMAS_TOPIC} with 'timeout-ms'=${TIMEOUT_MS}"
    # This is a really naïve way to do this... Realistically, it should have a consumer group that can
    # track offsets and progress continuously rather than always re-read from the beginning, which will get longer
    # over time.
    # TODO: Find a way to know how if timeout-ms is too small (compute `wc -l` for each day)
    docker run --rm -ti \
    -v $PWD:/workdir \
    confluentinc/cp-kafka:${CP_VERSION} \
    bash -c "kafka-console-consumer --from-beginning --property print.key=true \
    --bootstrap-server $1 --topic ${SCHEMAS_TOPIC} \
    --timeout-ms ${TIMEOUT_MS} \
    1>/workdir/${STDOUT} \
    2>/workdir/${STDERR}"

    # For debugging
    cat ${STDERR} # Should say "Consumed x records. But will also output the TimeoutException"
    echo "==> ${STDOUT} contains $(wc -l ${STDOUT} | awk '{print $1}') messages"

    echo "==> Compressing schemas"
    # Errors file is not removed so we can download and inspect it later
    tar -cjvf schema-registry-${NOW}.tar.bz2 ${STDOUT} ${STDERR} && rm ${STDOUT} # ${STDERR}

    # TODO: Upload to S3, for example
    # aws s3 cp schema-registry-${NOW}.tar.bz2 ${S3_BUCKET}/backup-schema-registry/