Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Select an option

  • Save anirtek/3eee39106cfaae46a0fa37e868248b43 to your computer and use it in GitHub Desktop.

Select an option

Save anirtek/3eee39106cfaae46a0fa37e868248b43 to your computer and use it in GitHub Desktop.

Revisions

  1. @tobilg tobilg revised this gist Mar 14, 2016. 1 changed file with 5 additions and 1 deletion.
    6 changes: 5 additions & 1 deletion custom_s3_endpoint_in_spark.md
    Original file line number Diff line number Diff line change
    @@ -26,4 +26,8 @@ sc.hadoopConfiguration.set("fs.s3a.connection.ssl.enabled", "false");

    You can use s3a urls like this:

    s3a://<<bucket>>/<<folder>>/<<file>>
    s3a://<<BUCKET>>/<<FOLDER>>/<<FILE>>

    Also, it is possible to use the credentials in the path:

    s3a://<<ACCESS_KEY>>:<<SECRET_KEY>>@<<BUCKET>>/<<FOLDER>>/<<FILE>>
  2. @tobilg tobilg revised this gist Mar 14, 2016. 1 changed file with 5 additions and 1 deletion.
    6 changes: 5 additions & 1 deletion custom_s3_endpoint_in_spark.md
    Original file line number Diff line number Diff line change
    @@ -1,6 +1,10 @@
    # Custom S3 endpoints with Spark

    To be able to use custom endpoints with the latest Spark distribution, one needs to add an external package (`hadoop-aws`). Then, custum endpoints can be configured according to [docs](https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html).

    ## Use the `hadoop-aws` package

    bin/spark-shell --packages org.apache.hadoop:hadoop-aws:2.7.1
    bin/spark-shell --packages org.apache.hadoop:hadoop-aws:2.7.2

    ## SparkContext configuration

  3. @tobilg tobilg created this gist Mar 14, 2016.
    25 changes: 25 additions & 0 deletions custom_s3_endpoint_in_spark.md
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,25 @@
    ## Use the `hadoop-aws` package

    bin/spark-shell --packages org.apache.hadoop:hadoop-aws:2.7.1

    ## SparkContext configuration

    Add this to your application, or in the `spark-shell`:

    ```scala
    sc.hadoopConfiguration.set("fs.s3a.endpoint", "<<ENDPOINT>>");
    sc.hadoopConfiguration.set("fs.s3a.access.key","<<ACCESS_KEY>>");
    sc.hadoopConfiguration.set("fs.s3a.secret.key","<<SECRET_KEY>>");
    ```

    If your endpoint doesn't support HTTPS, then you'll need the following:

    ```scala
    sc.hadoopConfiguration.set("fs.s3a.connection.ssl.enabled", "false");
    ```

    ## S3 url usage

    You can use s3a urls like this:

    s3a://<<bucket>>/<<folder>>/<<file>>