Last active
February 12, 2016 17:12
-
-
Save devdazed/749d046b1a1da8869d68 to your computer and use it in GitHub Desktop.
Revisions
-
Russ Bradberry revised this gist
Feb 12, 2016 . 1 changed file with 2 additions and 2 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -15,8 +15,8 @@ ``` > virtualenv .jupyter > source .jupyter/bin/activate > pip install ipython > pip install jupyter > PYSPARK_SUBMIT_ARGS="$PYSPARK_SUBMIT_ARGS pyspark-shell" IPYTHON_OPTS="notebook --ip='*' --no-browser" dse pyspark ``` -
Russ Bradberry renamed this gist
Feb 12, 2016 . 1 changed file with 0 additions and 0 deletions.There are no files selected for viewing
File renamed without changes. -
Russ Bradberry revised this gist
Feb 12, 2016 . No changes.There are no files selected for viewing
-
Russ Bradberry created this gist
Feb 12, 2016 .There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -0,0 +1,29 @@ ## On your client machine ### As the `root` user 1. Install DSE 2. In the cassandra.yml file, ensure the datacenter and cluster match your analytics datacenter 3. In the cassandra-env.sh file add this configuration line toward the bottom `JVM_OPTS="$JVM_OPTS -Dcassandra.join_ring=false"` This will make your DSE node a coordinator only, it will not own any data. You can use this node to submit jobs to DSE locally without the need to know which is the master node. 3. start DSE 4. Install python 5. Install virtualenv ### As the `cassandra` user ``` > virtualenv .jupyter > source .jupyter/bin/activate > sudo pip install ipython > sudo pip install jupyter > PYSPARK_SUBMIT_ARGS="$PYSPARK_SUBMIT_ARGS pyspark-shell" IPYTHON_OPTS="notebook --ip='*' --no-browser" dse pyspark ``` ## Notes You can use something like `supervisord` to keep jupyter running in the background. If you are getting a permission denied error when starting pyspark that look slike this: `OSError: [Errno 13] Permission denied: '/run/user/505/jupyter'` It is because the XDG_RUNTIME_DIR is set to your logged in user, in that case just add the following environment variable before starting pyspark: `JUPYTER_RUNTIME_DIR="$HOME/.jupyter/runtime`