Skip to content

Instantly share code, notes, and snippets.

@jp2007
Forked from hkhamm/install_spark.md
Created October 15, 2015 17:26
Show Gist options
  • Select an option

  • Save jp2007/eda44723416f3194a7dc to your computer and use it in GitHub Desktop.

Select an option

Save jp2007/eda44723416f3194a7dc to your computer and use it in GitHub Desktop.

Revisions

  1. @hkhamm hkhamm revised this gist Aug 15, 2014. 1 changed file with 7 additions and 8 deletions.
    15 changes: 7 additions & 8 deletions install_spark.md
    Original file line number Diff line number Diff line change
    @@ -13,17 +13,16 @@ brew install apache-spark
    Get the Spark Cassandra Connector
    ---------------------------------

    Create a directory for the connector and its support libs and cd into the directory:
    Clone the download script from Github Gist:

    ```
    mkdir connector
    cd connector
    git clone https://gist.github.com/b700fe70f0025a519171.git
    ```

    Clone the download script from Github Gist:
    Rename the cloned directory:

    ```
    git clone https://gist.github.com/b700fe70f0025a519171.git
    mv b700fe70f0025a519171 connector
    ```

    Run the script:
    @@ -36,7 +35,7 @@ Start the Spark Master and a Worker
    -----------------------------------

    ```
    ./usr/local/Cellar/apache-spark/sbin/start-all.sh
    ./usr/local/Cellar/apache-spark/1.0.2/libexec/sbin/start-all.sh
    ```

    Testing the install
    @@ -77,13 +76,13 @@ import org.apache.spark._
    val conf = new SparkConf()
    conf.set("spark.cassandra.connection.host", "127.0.0.1")
    conf.set("spark.home","/usr/local/Cellar/apache-spark/")
    conf.set("spark.home","/usr/local/Cellar/apache-spark/1.0.2/libexec")
    // You may not need these two settings if you haven't set up password authentication in Cassandra
    conf.set("spark.cassandra.auth.username", "cassandra")
    conf.set("spark.cassandra.auth.password", "cassandra")
    val sc = new SparkContext("spark://ip-10-0-0-192:7077", "Cassandra Connector Test", conf)
    val sc = new SparkContext("spark://localhost:7077", "Cassandra Connector Test", conf)
    sc.addJar("path/to/connector/cassandra-driver-core-2.0.3.jar")
    sc.addJar("path/to/connector/cassandra-thrift-2.0.9.jar")
    sc.addJar("path/to/connector/commons-codec-1.2.jar")
  2. @hkhamm hkhamm revised this gist Aug 15, 2014. 1 changed file with 7 additions and 0 deletions.
    7 changes: 7 additions & 0 deletions install_spark.md
    Original file line number Diff line number Diff line change
    @@ -32,6 +32,13 @@ Run the script:
    bash install_connector.sh
    ```

    Start the Spark Master and a Worker
    -----------------------------------

    ```
    ./usr/local/Cellar/apache-spark/sbin/start-all.sh
    ```

    Testing the install
    -------------------

  3. @hkhamm hkhamm revised this gist Aug 15, 2014. 1 changed file with 2 additions and 2 deletions.
    4 changes: 2 additions & 2 deletions install_spark.md
    Original file line number Diff line number Diff line change
    @@ -1,5 +1,5 @@
    Install and Setup Spark for Cassandra on Max OS X
    =================================================
    Install, Setup, and Test Spark and Cassandra on Mac OS X
    ========================================================

    This Gist assumes you already followed the instructions to [install Cassandra](https://gist.github.com/hkhamm/a9a2b45dd749e5d3b3ae), created a keyspace and table, and added some data.

  4. @hkhamm hkhamm created this gist Aug 15, 2014.
    111 changes: 111 additions & 0 deletions install_spark.md
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,111 @@
    Install and Setup Spark for Cassandra on Max OS X
    =================================================

    This Gist assumes you already followed the instructions to [install Cassandra](https://gist.github.com/hkhamm/a9a2b45dd749e5d3b3ae), created a keyspace and table, and added some data.

    Install Apache Spark
    --------------------

    ```
    brew install apache-spark
    ```

    Get the Spark Cassandra Connector
    ---------------------------------

    Create a directory for the connector and its support libs and cd into the directory:

    ```
    mkdir connector
    cd connector
    ```

    Clone the download script from Github Gist:

    ```
    git clone https://gist.github.com/b700fe70f0025a519171.git
    ```

    Run the script:

    ```
    bash install_connector.sh
    ```

    Testing the install
    -------------------

    Make a note of the path to your connector directory.

    Open the Spark Shell with the connector:


    ```
    spark-shell --driver-class-path $(echo path/to/connector/*.jar | sed 's/ /:/g')
    ```

    Wait for everything to load. Once it is finished, you'll see a scala prompt:

    ```
    scala >
    ```

    You'll need to stop the default SparkContext, since you'll create your own with the script.

    ```
    scala > sc.stop
    ```

    Once that is finished, get ready to paste the script in:

    ```
    scala > :paste
    ```

    Paste in this script, make sure to change the path to the connector and to change keyspace and table to the names of your keyspace and table:

    ```
    import com.datastax.spark.connector._
    import org.apache.spark._
    val conf = new SparkConf()
    conf.set("spark.cassandra.connection.host", "127.0.0.1")
    conf.set("spark.home","/usr/local/Cellar/apache-spark/")
    // You may not need these two settings if you haven't set up password authentication in Cassandra
    conf.set("spark.cassandra.auth.username", "cassandra")
    conf.set("spark.cassandra.auth.password", "cassandra")
    val sc = new SparkContext("spark://ip-10-0-0-192:7077", "Cassandra Connector Test", conf)
    sc.addJar("path/to/connector/cassandra-driver-core-2.0.3.jar")
    sc.addJar("path/to/connector/cassandra-thrift-2.0.9.jar")
    sc.addJar("path/to/connector/commons-codec-1.2.jar")
    sc.addJar("path/to/connector/commons-lang3-3.1.jar")
    sc.addJar("path/to/connector/commons-logging-1.1.1.jar")
    sc.addJar("path/to/connector/guava-16.0.1.jar")
    sc.addJar("path/to/connector/httpclient-4.2.5.jar")
    sc.addJar("path/to/connector/httpcore-4.2.4.jar")
    sc.addJar("path/to/connector/joda-convert-1.6.jar")
    sc.addJar("path/to/connector/joda-time-2.3.jar")
    sc.addJar("path/to/connector/libthrift-0.9.1.jar")
    sc.addJar("path/to/connector/lz4-1.2.0.jar")
    sc.addJar("path/to/connector/metrics-core-3.0.2.jar")
    sc.addJar("path/to/connector/netty-3.9.0.Final.jar")
    sc.addJar("path/to/connector/slf4j-api-1.7.5.jar")
    sc.addJar("path/to/connector/snappy-java-1.0.5.jar")
    sc.addJar("path/to/connector/spark-cassandra-connector_2.10-1.0.0-rc2.jar")
    val table = sc.cassandraTable("keyspace", "table")
    table.count
    ```

    Make sure you are on a new line after 'table.count', then hit ctl-D to get out of paste mode.

    If everything is set up correctly it should start running the script and at the end it will print out the number of rows in your Cassandra database.

    Thanks to Al Toby, Open Source Mechanic at DataStax, for the connector installation script and for the [blog post](http://planetcassandra.org/blog/installing-the-cassandra-spark-oss-stack/) that helped me write this guide.

    Have fun with Spark and Cassandra!