Skip to content

Instantly share code, notes, and snippets.

@tsusanto
Created November 9, 2017 19:00
Show Gist options
  • Save tsusanto/1832a120d8034dca0d94ac03ecfb7907 to your computer and use it in GitHub Desktop.
Save tsusanto/1832a120d8034dca0d94ac03ecfb7907 to your computer and use it in GitHub Desktop.

Revisions

  1. tsusanto created this gist Nov 9, 2017.
    26 changes: 26 additions & 0 deletions download_from_kafka.sh
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,26 @@
    #!/bin/bash
    ##Tenny Susanto
    ##2017-11-09
    ##download messages from Kafka partition by partition
    ##do not use this in a Kakfa queue that is constantly receiving new messages
    ##this should only be used for manual pull for historical data done by platform team (they dump the data into Kafka once and don't write to it)

    ##This file contains the login/password to the Kafka server
    export KAFKA_OPTS="-Djava.security.auth.login.config=/path/jass.conf"

    ##get first offset for each partition
    /usr/bin/kafka-run-class kafka.tools.GetOffsetShell --broker-list broker1:10002 --topic yourtopic --time -2 | awk -F: '{ print $2 "\t" $3}' | sort -k1 -n > first_offset

    ##get last offset for each partition
    /usr/bin/kafka-run-class kafka.tools.GetOffsetShell --broker-list broker1:10002 --topic yourtopic --time -1 | awk -F: '{ print $2 "\t" $3}' | sort -k1 -n > last_offset

    ##get the number of messages in each partition
    paste first_offset last_offset | awk '{print $1,$4-$2}' > output

    ##Get all the messages for each partition
    while read -r partition rowcount;
    do
    filename=$(printf %03d $partition)
    echo $partition $rowcount $filename
    /usr/bin/kafka-console-consumer --new-consumer --bootstrap-server broker1:9092,broker2:9092,broker3:9092 --topic yourtopic --from-beginning --max-messages $rowcount --partition $partition --consumer.config /path/consumer.cfg > partition_${filename}
    done < output