Skip to content

Instantly share code, notes, and snippets.

@bartekdobija
Last active August 31, 2015 13:33
Show Gist options
  • Select an option

  • Save bartekdobija/4ca798a5407f57ab26a3 to your computer and use it in GitHub Desktop.

Select an option

Save bartekdobija/4ca798a5407f57ab26a3 to your computer and use it in GitHub Desktop.

Revisions

  1. bartekdobija revised this gist Aug 31, 2015. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion spark-without-hadoop.sh
    Original file line number Diff line number Diff line change
    @@ -51,7 +51,7 @@
    #spark.executor.extraClassPath /usr/local/lib/jdbc/sqlserver/*.jar:/usr/local/lib/jdbc/mysql/*.jar:/usr/local/anaconda/bin

    ####### spark-env.sh #######
    # HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop
    # HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop/
    # SPARK_DIST_CLASSPATH=$(hadoop classpath)
    # LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/usr/local/hadoop/lib/native/

  2. bartekdobija revised this gist Aug 31, 2015. 1 changed file with 0 additions and 1 deletion.
    1 change: 0 additions & 1 deletion spark-without-hadoop.sh
    Original file line number Diff line number Diff line change
    @@ -18,7 +18,6 @@
    # Spark dependencies should be configured as per this document https://spark.apache.org/docs/latest/hadoop-provided.html

    ####### spark-defaults.conf: #######

    #spark.yarn.jar hdfs:///user/spark/share/lib/spark-assembly-1.5.0-SNAPSHOT-hadoop2.6.0.jar
    #spark.ui.enabled false
    ##spark.shuffle.spill false
  3. bartekdobija revised this gist Aug 31, 2015. 1 changed file with 4 additions and 2 deletions.
    6 changes: 4 additions & 2 deletions spark-without-hadoop.sh
    Original file line number Diff line number Diff line change
    @@ -16,7 +16,9 @@
    # Spark without hadoop dependencies.
    # Don't forget to install snappy & snappy-devel on RHEL/CentOS etc.
    # Spark dependencies should be configured as per this document https://spark.apache.org/docs/latest/hadoop-provided.html
    # spark-defaults.conf:

    ####### spark-defaults.conf: #######

    #spark.yarn.jar hdfs:///user/spark/share/lib/spark-assembly-1.5.0-SNAPSHOT-hadoop2.6.0.jar
    #spark.ui.enabled false
    ##spark.shuffle.spill false
    @@ -49,7 +51,7 @@
    #spark.executor.extraLibraryPath /opt/cloudera/parcels/CDH/lib/hadoop/lib/native
    #spark.executor.extraClassPath /usr/local/lib/jdbc/sqlserver/*.jar:/usr/local/lib/jdbc/mysql/*.jar:/usr/local/anaconda/bin

    # spark-env.sh
    ####### spark-env.sh #######
    # HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop
    # SPARK_DIST_CLASSPATH=$(hadoop classpath)
    # LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/usr/local/hadoop/lib/native/
  4. bartekdobija revised this gist Aug 31, 2015. 1 changed file with 31 additions and 6 deletions.
    37 changes: 31 additions & 6 deletions spark-without-hadoop.sh
    Original file line number Diff line number Diff line change
    @@ -16,13 +16,38 @@
    # Spark without hadoop dependencies.
    # Don't forget to install snappy & snappy-devel on RHEL/CentOS etc.
    # Spark dependencies should be configured as per this document https://spark.apache.org/docs/latest/hadoop-provided.html

    # spark-defaults.conf:
    # spark.rdd.compress true
    # spark.serializer org.apache.spark.serializer.KryoSerializer
    # spark.localExecution.enabled true
    # spark.master yarn
    # spark.yarn.jar hdfs:///user/spark/lib/spark-assembly-1.4.1-hadoop2.6.0.jar
    #spark.yarn.jar hdfs:///user/spark/share/lib/spark-assembly-1.5.0-SNAPSHOT-hadoop2.6.0.jar
    #spark.ui.enabled false
    ##spark.shuffle.spill false
    ##spark.shuffle.spill.compress true
    ##spark.shuffle.consolidateFiles true
    ##spark.shuffle.service.enabled true
    ## Execution Behavior
    #spark.broadcast.blockSize 4096
    ## Dynamic Resource Allocation (YARN)
    ##spark.dynamicAllocation.enabled true
    ##spark.dynamicAllocation.executorIdleTimeout 10800
    ##spark.dynamicAllocation.initialExecutors 3
    ##spark.speculation true
    #spark.scheduler.mode FAIR
    #spark.executor.memory 5G
    #spark.kryoserializer.buffer.max 1000m
    #spark.driver.maxResultSize 0
    #spark.serializer org.apache.spark.serializer.KryoSerializer
    #spark.yarn.preserve.staging.files false
    #spark.master yarn
    #spark.rdd.compress true
    ## Local execution of selected Spark functions
    #spark.localExecution.enabled true
    #spark.sql.parquet.binaryAsString true
    #spark.sql.parquet.compression.codec snappy
    ## use lz4 compression for broadcast variables as Snappy is not supported on MacOSX
    #spark.broadcast.compress true
    #spark.io.compression.codec lz4
    #spark.driver.extraLibraryPath /usr/local/hadoop/lib/native/
    #spark.executor.extraLibraryPath /opt/cloudera/parcels/CDH/lib/hadoop/lib/native
    #spark.executor.extraClassPath /usr/local/lib/jdbc/sqlserver/*.jar:/usr/local/lib/jdbc/mysql/*.jar:/usr/local/anaconda/bin

    # spark-env.sh
    # HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop
  5. bartekdobija revised this gist Aug 30, 2015. 1 changed file with 1 addition and 0 deletions.
    1 change: 1 addition & 0 deletions spark-without-hadoop.sh
    Original file line number Diff line number Diff line change
    @@ -27,5 +27,6 @@
    # spark-env.sh
    # HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop
    # SPARK_DIST_CLASSPATH=$(hadoop classpath)
    # LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/usr/local/hadoop/lib/native/

    ./make-distribution.sh --name without-hadoop --tgz -Phadoop-2.6 -Psparkr -Phadoop-provided -Phive -Phive-thriftserver -Pyarn -DzincPort=3038 -DskipTests -Dmaven.javadoc.skip=true
  6. bartekdobija renamed this gist Aug 30, 2015. 1 changed file with 0 additions and 0 deletions.
    File renamed without changes.
  7. bartekdobija revised this gist Aug 30, 2015. 1 changed file with 13 additions and 0 deletions.
    13 changes: 13 additions & 0 deletions apache spark without hadoop
    Original file line number Diff line number Diff line change
    @@ -1,5 +1,18 @@
    #!/usr/bin/env bash

    # In this case I have a Hadoop distro compiled from source:
    # MAVEN_OPTS="-Xms512m -Xmx1024m" mvn package -Pdist,native -DskipTests -Dtar
    # verified with:
    # hadoop checknative -a
    # with output:
    # Native library checking:
    # hadoop: true /usr/local/hadoop-2.6.0/lib/native/libhadoop.so.1.0.0
    # zlib: true /lib64/libz.so.1
    # snappy: true /usr/lib64/libsnappy.so.1
    # lz4: true revision:99
    # bzip2: true /lib64/libbz2.so.1
    # openssl: true /usr/lib64/libcrypto.so

    # Spark without hadoop dependencies.
    # Don't forget to install snappy & snappy-devel on RHEL/CentOS etc.
    # Spark dependencies should be configured as per this document https://spark.apache.org/docs/latest/hadoop-provided.html
  8. bartekdobija renamed this gist Aug 30, 2015. 1 changed file with 11 additions and 0 deletions.
    11 changes: 11 additions & 0 deletions gistfile1.sh → apache spark without hadoop
    Original file line number Diff line number Diff line change
    @@ -4,4 +4,15 @@
    # Don't forget to install snappy & snappy-devel on RHEL/CentOS etc.
    # Spark dependencies should be configured as per this document https://spark.apache.org/docs/latest/hadoop-provided.html

    # spark-defaults.conf:
    # spark.rdd.compress true
    # spark.serializer org.apache.spark.serializer.KryoSerializer
    # spark.localExecution.enabled true
    # spark.master yarn
    # spark.yarn.jar hdfs:///user/spark/lib/spark-assembly-1.4.1-hadoop2.6.0.jar

    # spark-env.sh
    # HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop
    # SPARK_DIST_CLASSPATH=$(hadoop classpath)

    ./make-distribution.sh --name without-hadoop --tgz -Phadoop-2.6 -Psparkr -Phadoop-provided -Phive -Phive-thriftserver -Pyarn -DzincPort=3038 -DskipTests -Dmaven.javadoc.skip=true
  9. bartekdobija revised this gist Aug 30, 2015. 1 changed file with 5 additions and 1 deletion.
    6 changes: 5 additions & 1 deletion gistfile1.sh
    Original file line number Diff line number Diff line change
    @@ -1,3 +1,7 @@
    #!/usr/bin/env bash

    ./make-distribution.sh --tgz --with-tachyon -Phadoop-2.4 -Dhadoop.version=2.5.0 -Pyarn -Phive
    # Spark without hadoop dependencies.
    # Don't forget to install snappy & snappy-devel on RHEL/CentOS etc.
    # Spark dependencies should be configured as per this document https://spark.apache.org/docs/latest/hadoop-provided.html

    ./make-distribution.sh --name without-hadoop --tgz -Phadoop-2.6 -Psparkr -Phadoop-provided -Phive -Phive-thriftserver -Pyarn -DzincPort=3038 -DskipTests -Dmaven.javadoc.skip=true
  10. bartekdobija created this gist Nov 20, 2014.
    3 changes: 3 additions & 0 deletions gistfile1.sh
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,3 @@
    #!/usr/bin/env bash

    ./make-distribution.sh --tgz --with-tachyon -Phadoop-2.4 -Dhadoop.version=2.5.0 -Pyarn -Phive