Skip to content

Instantly share code, notes, and snippets.

@bartekdobija
Last active August 31, 2015 13:33
Show Gist options
  • Select an option

  • Save bartekdobija/4ca798a5407f57ab26a3 to your computer and use it in GitHub Desktop.

Select an option

Save bartekdobija/4ca798a5407f57ab26a3 to your computer and use it in GitHub Desktop.
#!/usr/bin/env bash
# In this case I have a Hadoop distro compiled from source:
# MAVEN_OPTS="-Xms512m -Xmx1024m" mvn package -Pdist,native -DskipTests -Dtar
# verified with:
# hadoop checknative -a
# with output:
# Native library checking:
# hadoop: true /usr/local/hadoop-2.6.0/lib/native/libhadoop.so.1.0.0
# zlib: true /lib64/libz.so.1
# snappy: true /usr/lib64/libsnappy.so.1
# lz4: true revision:99
# bzip2: true /lib64/libbz2.so.1
# openssl: true /usr/lib64/libcrypto.so
# Spark without hadoop dependencies.
# Don't forget to install snappy & snappy-devel on RHEL/CentOS etc.
# Spark dependencies should be configured as per this document https://spark.apache.org/docs/latest/hadoop-provided.html
# spark-defaults.conf:
# spark.rdd.compress true
# spark.serializer org.apache.spark.serializer.KryoSerializer
# spark.localExecution.enabled true
# spark.master yarn
# spark.yarn.jar hdfs:///user/spark/lib/spark-assembly-1.4.1-hadoop2.6.0.jar
# spark-env.sh
# HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop
# SPARK_DIST_CLASSPATH=$(hadoop classpath)
./make-distribution.sh --name without-hadoop --tgz -Phadoop-2.6 -Psparkr -Phadoop-provided -Phive -Phive-thriftserver -Pyarn -DzincPort=3038 -DskipTests -Dmaven.javadoc.skip=true
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment