Skip to content

Instantly share code, notes, and snippets.

@brunocrt
Forked from hkhamm/testing_spark_cassandra.md
Created March 13, 2018 20:59
Show Gist options
  • Save brunocrt/5cb1ded9f6412f5dd48c54b4741c6cb3 to your computer and use it in GitHub Desktop.
Save brunocrt/5cb1ded9f6412f5dd48c54b4741c6cb3 to your computer and use it in GitHub Desktop.

Revisions

  1. @hkhamm hkhamm revised this gist Aug 18, 2014. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion testing_spark_cassandra.md
    Original file line number Diff line number Diff line change
    @@ -161,7 +161,7 @@ scalaVersion := "2.10.4"

    libraryDependencies ++= Seq(
    "org.apache.spark" %% "spark-core" % "1.0.2",
    "com.datastax.spark" %% "spark-cassandra-connector" % "1.0.0-rc2"
    "com.datastax.spark" %% "spark-cassandra-connector" % "1.0.0-rc3"
    )

    resolvers += "Akka Repository" at "http://repo.akka.io/releases/"
  2. @hkhamm hkhamm revised this gist Aug 18, 2014. 1 changed file with 2 additions and 2 deletions.
    4 changes: 2 additions & 2 deletions testing_spark_cassandra.md
    Original file line number Diff line number Diff line change
    @@ -174,9 +174,9 @@ mergeStrategy in assembly := {
    }
    ```

    Here is the project/plugins.sbt
    Here is project/plugins.sbt

    ```
    ```bash
    addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.11.2")
    ```

  3. @hkhamm hkhamm revised this gist Aug 18, 2014. 1 changed file with 17 additions and 0 deletions.
    17 changes: 17 additions & 0 deletions testing_spark_cassandra.md
    Original file line number Diff line number Diff line change
    @@ -149,6 +149,10 @@ object SparkTest extends App {
    Here is the associated build.sbt.

    ```scala
    import AssemblyKeys._

    assemblySettings

    name := "sparktest"

    version := "1.0"
    @@ -161,6 +165,19 @@ libraryDependencies ++= Seq(
    )

    resolvers += "Akka Repository" at "http://repo.akka.io/releases/"

    mergeStrategy in assembly := {
    case m if m.toLowerCase.matches("meta-inf.*\\.sf$") => MergeStrategy.discard
    case m if m.toLowerCase.matches("meta-inf.*\\.rsa$") => MergeStrategy.discard
    case m if m.toLowerCase.matches("meta-inf.*\\.dsa$") => MergeStrategy.discard
    case _ => MergeStrategy.first
    }
    ```

    Here is the project/plugins.sbt

    ```
    addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.11.2")
    ```

    The spark-submit command can take all the same options used above for spark-shell. In addition, you must provide your Scala program's main class or object name with the '--class' option. Here the jar is in the user's home directory and named 'sparktest.jar'.
  4. @hkhamm hkhamm revised this gist Aug 18, 2014. 1 changed file with 2 additions and 2 deletions.
    4 changes: 2 additions & 2 deletions testing_spark_cassandra.md
    Original file line number Diff line number Diff line change
    @@ -41,8 +41,8 @@ sudo service cassandra restart
    - made sure the Spark master and a worker are running:

    ```bash
    ./opt/spark/sbin/stop-all.sh
    ./opt/spark/sbin/start-all.sh
    bash /opt/spark/sbin/stop-all.sh
    bash /opt/spark/sbin/start-all.sh
    ```


  5. @hkhamm hkhamm revised this gist Aug 18, 2014. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion testing_spark_cassandra.md
    Original file line number Diff line number Diff line change
    @@ -49,7 +49,7 @@ sudo service cassandra restart
    spark-shell
    -----------

    The spark-shell command requires that you give it the spark-cassandra-connector libraries with the '--driver-class-path' option, give it the jars you will need with the '--jars' option, and set the master with the '--master' option.
    Use the spark-shell command and give it the spark-cassandra-connector libraries with the '--driver-class-path' option, the jars you will need with the '--jars' option, and the master with the '--master' option.

    ```bash
    spark-shell --driver-class-path $(echo /opt/connector/*.jar | sed 's/ /:/g') --jars $(echo /opt/connector/*.jar | sed 's/ /,/g') --master local[2]
  6. @hkhamm hkhamm revised this gist Aug 18, 2014. 1 changed file with 2 additions and 2 deletions.
    4 changes: 2 additions & 2 deletions testing_spark_cassandra.md
    Original file line number Diff line number Diff line change
    @@ -80,8 +80,8 @@ scala> :paste
    Copy and paste this Scala script, but **make sure to change the keyspace and table names** passed to sc.cassandraTable to reflect those in your Cassandra database.

    ```scala
    import com.datastax.spark.connector._
    import org.apache.spark._
    import com.datastax.spark.connector._

    // Create the configuration object necessary to start a SparkContext
    val conf = new SparkConf()
    @@ -115,8 +115,8 @@ The spark-submit command allows you to run a jar file instead of pasting a scrip
    Here is a version of the script above you can use in your jar. Don't forget to **change the keyspace and table names**.

    ```scala
    import com.datastax.spark.connector._
    import org.apache.spark._
    import com.datastax.spark.connector._

    object SparkTest extends App {

  7. @hkhamm hkhamm revised this gist Aug 18, 2014. 1 changed file with 2 additions and 6 deletions.
    8 changes: 2 additions & 6 deletions testing_spark_cassandra.md
    Original file line number Diff line number Diff line change
    @@ -69,7 +69,7 @@ Stop the old SparkContext:
    scala> sc.stop
    ```

    As before, this will often overrun the Scala prompt and you might have to hit enter to get a clear prompt.
    As before, this will often overrun the Scala prompt and you might have to hit enter to get a clear prompt.

    Use the ':paste' command to enter paste mode:

    @@ -104,11 +104,7 @@ table.count

    ```

    After pasting, make sure you are on a new line and hit ctrl-d to exit paste mode and run the script.

    The script will run for a bit, creating the new SparkContext, adding the jars, and talking with Cassandra. At the end, you should see the number of rows in your database.

    If everything worked out, you have just run a successful Spark/Cassandra test. Congratulations!
    After pasting, make sure you are on a new line and hit ctrl-d to exit paste mode and run the script. The script will run for a bit, creating the new SparkContext, adding the jars, and talking with Cassandra. At the end, you should see the number of rows in your database. If everything worked out, you have just run a successful Spark/Cassandra test. Congratulations!


    spark-submit
  8. @hkhamm hkhamm revised this gist Aug 17, 2014. 1 changed file with 2 additions and 2 deletions.
    4 changes: 2 additions & 2 deletions testing_spark_cassandra.md
    Original file line number Diff line number Diff line change
    @@ -160,8 +160,8 @@ version := "1.0"
    scalaVersion := "2.10.4"

    libraryDependencies ++= Seq(
    "org.apache.spark" % "spark-core" % "1.0.2",
    "com.datastax.spark" % "spark-cassandra-connector" % "1.0.0-rc2"
    "org.apache.spark" %% "spark-core" % "1.0.2",
    "com.datastax.spark" %% "spark-cassandra-connector" % "1.0.0-rc2"
    )

    resolvers += "Akka Repository" at "http://repo.akka.io/releases/"
  9. @hkhamm hkhamm revised this gist Aug 17, 2014. 1 changed file with 2 additions and 2 deletions.
    4 changes: 2 additions & 2 deletions testing_spark_cassandra.md
    Original file line number Diff line number Diff line change
    @@ -160,8 +160,8 @@ version := "1.0"
    scalaVersion := "2.10.4"

    libraryDependencies ++= Seq(
    "org.apache.spark" %% "spark-core" % "1.0.2",
    "com.datastax.spark" %% "spark-cassandra-connector" % "1.0.0-rc2"
    "org.apache.spark" % "spark-core" % "1.0.2",
    "com.datastax.spark" % "spark-cassandra-connector" % "1.0.0-rc2"
    )

    resolvers += "Akka Repository" at "http://repo.akka.io/releases/"
  10. @hkhamm hkhamm revised this gist Aug 17, 2014. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion testing_spark_cassandra.md
    Original file line number Diff line number Diff line change
    @@ -159,7 +159,7 @@ version := "1.0"

    scalaVersion := "2.10.4"

    libraryDependencies += Seq(
    libraryDependencies ++= Seq(
    "org.apache.spark" %% "spark-core" % "1.0.2",
    "com.datastax.spark" %% "spark-cassandra-connector" % "1.0.0-rc2"
    )
  11. @hkhamm hkhamm revised this gist Aug 17, 2014. 1 changed file with 2 additions and 2 deletions.
    4 changes: 2 additions & 2 deletions testing_spark_cassandra.md
    Original file line number Diff line number Diff line change
    @@ -178,13 +178,13 @@ Just as above, you can watch as it executes the script. If everything goes well,
    Troubleshooting
    ---------------

    The spark-submit command rejects jars with invalid signature files. You can check for this before submitting the jar to Spark:
    The spark-submit command rejects jars with invalid signature files. You can check for this before submitting the jar to Spark.

    ```bash
    jarsigner -verify sparktest.jar
    ```

    Here is the error you'll see if there is a problem:
    Here is the error you'll see if there is a problem.

    ```bash
    jarsigner: java.lang.SecurityException: Invalid signature file digest for Manifest main attributes
  12. @hkhamm hkhamm revised this gist Aug 17, 2014. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion testing_spark_cassandra.md
    Original file line number Diff line number Diff line change
    @@ -196,7 +196,7 @@ You can remove the signature file(s) and unsign the jar using the zip command li
    zip -d sparktest.jar META-INF/*.RSA META-INF/*.DSA META-INF/*.SF
    ```

    Now jarsigner will tell you the following and spark-submit should have no problem using the jar.
    Now jarsigner will tell you the following and spark-submit shouldn't complain about an invalid signature file.

    ```bash
    jar is unsigned. (signatures missing or not parsable)
  13. @hkhamm hkhamm revised this gist Aug 17, 2014. 1 changed file with 4 additions and 4 deletions.
    8 changes: 4 additions & 4 deletions testing_spark_cassandra.md
    Original file line number Diff line number Diff line change
    @@ -4,10 +4,10 @@ Testing Spark and Cassandra
    This guide leads the user through a basic test of a local [Spark](https://spark.apache.org/) and [Cassandra](http://cassandra.apache.org/) software stack.

    ####Table of Contents
    [Assumptions](#assumptions)
    [spark-shell](#spark-shell)
    [spark-submit](#spark-submit)
    [Troubleshooting](#troubleshooting)
    - [Assumptions](#assumptions)
    - [spark-shell](#spark-shell)
    - [spark-submit](#spark-submit)
    - [Troubleshooting](#troubleshooting)


    Assumptions
  14. @hkhamm hkhamm revised this gist Aug 17, 2014. 1 changed file with 1 addition and 2 deletions.
    3 changes: 1 addition & 2 deletions testing_spark_cassandra.md
    Original file line number Diff line number Diff line change
    @@ -3,8 +3,7 @@ Testing Spark and Cassandra

    This guide leads the user through a basic test of a local [Spark](https://spark.apache.org/) and [Cassandra](http://cassandra.apache.org/) software stack.

    ####Table of Contents|
    ---------------------|
    ####Table of Contents
    [Assumptions](#assumptions)
    [spark-shell](#spark-shell)
    [spark-submit](#spark-submit)
  15. @hkhamm hkhamm revised this gist Aug 17, 2014. 1 changed file with 2 additions and 2 deletions.
    4 changes: 2 additions & 2 deletions testing_spark_cassandra.md
    Original file line number Diff line number Diff line change
    @@ -3,8 +3,8 @@ Testing Spark and Cassandra

    This guide leads the user through a basic test of a local [Spark](https://spark.apache.org/) and [Cassandra](http://cassandra.apache.org/) software stack.

    ####Table of Contents
    ---------------------
    ####Table of Contents|
    ---------------------|
    [Assumptions](#assumptions)
    [spark-shell](#spark-shell)
    [spark-submit](#spark-submit)
  16. @hkhamm hkhamm revised this gist Aug 17, 2014. 1 changed file with 5 additions and 4 deletions.
    9 changes: 5 additions & 4 deletions testing_spark_cassandra.md
    Original file line number Diff line number Diff line change
    @@ -4,10 +4,11 @@ Testing Spark and Cassandra
    This guide leads the user through a basic test of a local [Spark](https://spark.apache.org/) and [Cassandra](http://cassandra.apache.org/) software stack.

    ####Table of Contents
    - [Assumptions](#assumptions)
    - [spark-shell](#spark-shell)
    - [spark-submit](#spark-submit)
    - [Troubleshooting](#troubleshooting)
    ---------------------
    [Assumptions](#assumptions)
    [spark-shell](#spark-shell)
    [spark-submit](#spark-submit)
    [Troubleshooting](#troubleshooting)


    Assumptions
  17. @hkhamm hkhamm revised this gist Aug 17, 2014. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion testing_spark_cassandra.md
    Original file line number Diff line number Diff line change
    @@ -3,7 +3,7 @@ Testing Spark and Cassandra

    This guide leads the user through a basic test of a local [Spark](https://spark.apache.org/) and [Cassandra](http://cassandra.apache.org/) software stack.

    ###Table of Contents
    ####Table of Contents
    - [Assumptions](#assumptions)
    - [spark-shell](#spark-shell)
    - [spark-submit](#spark-submit)
  18. @hkhamm hkhamm revised this gist Aug 17, 2014. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion testing_spark_cassandra.md
    Original file line number Diff line number Diff line change
    @@ -3,7 +3,7 @@ Testing Spark and Cassandra

    This guide leads the user through a basic test of a local [Spark](https://spark.apache.org/) and [Cassandra](http://cassandra.apache.org/) software stack.

    ##Table of Contents
    ###Table of Contents
    - [Assumptions](#assumptions)
    - [spark-shell](#spark-shell)
    - [spark-submit](#spark-submit)
  19. @hkhamm hkhamm revised this gist Aug 17, 2014. 1 changed file with 2 additions and 0 deletions.
    2 changes: 2 additions & 0 deletions testing_spark_cassandra.md
    Original file line number Diff line number Diff line change
    @@ -6,6 +6,8 @@ This guide leads the user through a basic test of a local [Spark](https://spark.
    ##Table of Contents
    - [Assumptions](#assumptions)
    - [spark-shell](#spark-shell)
    - [spark-submit](#spark-submit)
    - [Troubleshooting](#troubleshooting)


    Assumptions
  20. @hkhamm hkhamm revised this gist Aug 17, 2014. 1 changed file with 2 additions and 1 deletion.
    3 changes: 2 additions & 1 deletion testing_spark_cassandra.md
    Original file line number Diff line number Diff line change
    @@ -4,7 +4,8 @@ Testing Spark and Cassandra
    This guide leads the user through a basic test of a local [Spark](https://spark.apache.org/) and [Cassandra](http://cassandra.apache.org/) software stack.

    ##Table of Contents
    - [Assumptions](#Assumptions)
    - [Assumptions](#assumptions)
    - [spark-shell](#spark-shell)


    Assumptions
  21. @hkhamm hkhamm revised this gist Aug 17, 2014. 1 changed file with 4 additions and 0 deletions.
    4 changes: 4 additions & 0 deletions testing_spark_cassandra.md
    Original file line number Diff line number Diff line change
    @@ -3,6 +3,10 @@ Testing Spark and Cassandra

    This guide leads the user through a basic test of a local [Spark](https://spark.apache.org/) and [Cassandra](http://cassandra.apache.org/) software stack.

    ##Table of Contents
    - [Assumptions](#Assumptions)


    Assumptions
    -----------

  22. @hkhamm hkhamm revised this gist Aug 17, 2014. 1 changed file with 0 additions and 2 deletions.
    2 changes: 0 additions & 2 deletions testing_spark_cassandra.md
    Original file line number Diff line number Diff line change
    @@ -3,8 +3,6 @@ Testing Spark and Cassandra

    This guide leads the user through a basic test of a local [Spark](https://spark.apache.org/) and [Cassandra](http://cassandra.apache.org/) software stack.

    [TOC]

    Assumptions
    -----------

  23. @hkhamm hkhamm revised this gist Aug 17, 2014. 1 changed file with 1 addition and 0 deletions.
    1 change: 1 addition & 0 deletions testing_spark_cassandra.md
    Original file line number Diff line number Diff line change
    @@ -3,6 +3,7 @@ Testing Spark and Cassandra

    This guide leads the user through a basic test of a local [Spark](https://spark.apache.org/) and [Cassandra](http://cassandra.apache.org/) software stack.

    [TOC]

    Assumptions
    -----------
  24. @hkhamm hkhamm revised this gist Aug 17, 2014. 1 changed file with 2 additions and 2 deletions.
    4 changes: 2 additions & 2 deletions testing_spark_cassandra.md
    Original file line number Diff line number Diff line change
    @@ -190,9 +190,9 @@ You can remove the signature file(s) and unsign the jar using the zip command li
    zip -d sparktest.jar META-INF/*.RSA META-INF/*.DSA META-INF/*.SF
    ```

    Now jarsigner will tell you this:
    Now jarsigner will tell you the following and spark-submit should have no problem using the jar.

    ```
    ```bash
    jar is unsigned. (signatures missing or not parsable)
    ```

  25. @hkhamm hkhamm revised this gist Aug 17, 2014. 1 changed file with 6 additions and 0 deletions.
    6 changes: 6 additions & 0 deletions testing_spark_cassandra.md
    Original file line number Diff line number Diff line change
    @@ -190,6 +190,12 @@ You can remove the signature file(s) and unsign the jar using the zip command li
    zip -d sparktest.jar META-INF/*.RSA META-INF/*.DSA META-INF/*.SF
    ```

    Now jarsigner will tell you this:

    ```
    jar is unsigned. (signatures missing or not parsable)
    ```


    ---

  26. @hkhamm hkhamm revised this gist Aug 17, 2014. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion testing_spark_cassandra.md
    Original file line number Diff line number Diff line change
    @@ -49,7 +49,7 @@ The spark-shell command requires that you give it the spark-cassandra-connector
    spark-shell --driver-class-path $(echo /opt/connector/*.jar | sed 's/ /:/g') --jars $(echo /opt/connector/*.jar | sed 's/ /,/g') --master local[2]
    ```

    You can watch as the shell initializes itself. After a bit, you will either see the basic Scala prompt or it will obviously stop printing to stdout and you can hit enter to clear the prompt.
    You can watch as the shell initializes itself. After a bit, you will either see the basic Scala prompt or it will eventually stop printing to stdout and you can hit enter to clear the prompt.

    ```
    scala>
  27. @hkhamm hkhamm revised this gist Aug 17, 2014. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion testing_spark_cassandra.md
    Original file line number Diff line number Diff line change
    @@ -184,7 +184,7 @@ Here is the error you'll see if there is a problem:
    jarsigner: java.lang.SecurityException: Invalid signature file digest for Manifest main attributes
    ```

    You can remove the signature file(s) and unsign the jar using the zip command line too and using the '-d' option to remove all *.RSA, *.DSA, and *.SF files from the jar's META_INF directory.
    You can remove the signature file(s) and unsign the jar using the zip command line tool with the '-d' option to delete all *.RSA, *.DSA, and *.SF files from the jar's META_INF directory.

    ```bash
    zip -d sparktest.jar META-INF/*.RSA META-INF/*.DSA META-INF/*.SF
  28. @hkhamm hkhamm revised this gist Aug 17, 2014. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion testing_spark_cassandra.md
    Original file line number Diff line number Diff line change
    @@ -184,7 +184,7 @@ Here is the error you'll see if there is a problem:
    jarsigner: java.lang.SecurityException: Invalid signature file digest for Manifest main attributes
    ```

    You can run this command on the jar to remove the signature file(s) and unsign the jar:
    You can remove the signature file(s) and unsign the jar using the zip command line too and using the '-d' option to remove all *.RSA, *.DSA, and *.SF files from the jar's META_INF directory.

    ```bash
    zip -d sparktest.jar META-INF/*.RSA META-INF/*.DSA META-INF/*.SF
  29. @hkhamm hkhamm revised this gist Aug 17, 2014. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion testing_spark_cassandra.md
    Original file line number Diff line number Diff line change
    @@ -178,7 +178,7 @@ The spark-submit command rejects jars with invalid signature files. You can chec
    jarsigner -verify sparktest.jar
    ```

    Here is the error you will see if there is a problem:
    Here is the error you'll see if there is a problem:

    ```bash
    jarsigner: java.lang.SecurityException: Invalid signature file digest for Manifest main attributes
  30. @hkhamm hkhamm revised this gist Aug 17, 2014. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion testing_spark_cassandra.md
    Original file line number Diff line number Diff line change
    @@ -161,7 +161,7 @@ libraryDependencies += Seq(
    resolvers += "Akka Repository" at "http://repo.akka.io/releases/"
    ```

    The spark-submit command can take all the same options used above for spark-shell. In addition, you must provide your Scala program's main class or object name with the '--class' option. Here the jar is in the users home directory and named 'sparktest.jar'.
    The spark-submit command can take all the same options used above for spark-shell. In addition, you must provide your Scala program's main class or object name with the '--class' option. Here the jar is in the user's home directory and named 'sparktest.jar'.

    ```bash
    spark-submit --driver-class-path $(echo /opt/connector/*.jar | sed 's/ /:/g') --jars $(echo /opt/connector/*.jar | sed 's/ /,/g') --master local[2] --class SparkTest ~/sparktest.jar