Skip to content

Instantly share code, notes, and snippets.

View sanjeevbadgeville's full-sized avatar

Sanjeev Chakravarty sanjeevbadgeville

  • SAP Sales Cloud
  • Dublin
View GitHub Profile
curl -L https://github.com/docker/machine/releases/download/v0.8.2/docker-machine-`uname -s`-`uname -m` >/usr/local/bin/docker-machine && \
chmod +x /usr/local/bin/docker-machine
docker-machine version


docker-machine ls
docker-machine create --driver virtualbox default
docker-machine ls
docker-machine env default
@sanjeevbadgeville
sanjeevbadgeville / INSTALL
Created February 6, 2018 07:45 — forked from arya-oss/INSTALL.md
Ubuntu 16.04 Developer Tools installation
###### development tools
sudo apt-get install build-essential python-dev git nodejs-legacy npm gnome-tweak-tool openjdk-8-jdk
### Python packages
sudo apt-get install python-pip python-virtualenv python-numpy python-matplotlib
### pip packages
pip install django flask django-widget-tweaks django-ckeditor beautifulsoup4 requests classifier SymPy ipython
@sanjeevbadgeville
sanjeevbadgeville / zeppelin_solr_spark_oh_my_meetup_notes.md
Created November 17, 2016 02:44 — forked from epugh/zeppelin_solr_spark_oh_my_meetup_notes.md
Steps for following along with Eric's Zeppelin talk.

The below steps all assume you have installed Docker. I used the Kitematic tool for OSX, and it worked great. My local container VM IP is 192.168.99.100, replace that in the commands with your local IP!

  1. Let's Set up Zeppelin

    I am using this Docker image https://github.com/dylanmei/docker-zeppelin to fire up Zeppelin and Spark. Note, it's slow cause there is so many processes (Spark Master, Spark Worker, Zeppelin) to start!

    docker run -d --name zeppelin -p 8080:8080 dylanmei/zeppelin
    
@sanjeevbadgeville
sanjeevbadgeville / Vertica_Query_Times
Created August 23, 2016 19:50 — forked from jackghm/Vertica_Query_Times
Vertica Query request times over time by user
-- Query request times over time by user
select distinct TheDay, user_name
, (min_dat / 1000) as min_sec, (max_dat / 1000) as max_sec
, (avg_dat / 1000) as avg_sec, (median_dat / 1000) as median_sec
, query_cnt
from (
select DATE(end_timestamp::timestamp) as TheDay, user_name
, min(request_duration_ms) over(partition by DATE(end_timestamp::timestamp), user_name ) min_dat
, max(request_duration_ms) over(partition by DATE(end_timestamp::timestamp), user_name ) max_dat
, avg(request_duration_ms) over(partition by DATE(end_timestamp::timestamp), user_name ) avg_dat
@sanjeevbadgeville
sanjeevbadgeville / experiments-spark.R
Created April 20, 2016 18:17
Sample code for working with Apache Spark (v1.4), SparkR and ParquetFiles from RStudio
# see github repos & package documentation
# - http://github.com/apache/spark/tree/master/R
# - http://spark.apache.org/docs/latest/api/R/
# install the SparkR package
devtools::install_github("apache/spark", ref="master", subdir="R/pkg")
# load the SparkR & ggplot2 packages
library('SparkR')
library('ggplot2')
@sanjeevbadgeville
sanjeevbadgeville / generate.r
Created December 1, 2015 21:25 — forked from githoov/generate.r
R Script to Create a Survival Plot and to Generate a Sample Data Set
# preliminaires
library("ggplot2")
library("zoo")
set.seed(111)
# generate plot of survival curve
x <- sort(dexp(seq(0, 1, 0.01)), decreasing = TRUE)
ggplot(data.frame(x = c(0, 5)), aes(x)) + stat_function(fun = dexp, args = list(rate = 1)) + scale_x_continuous(labels=c(expression(t["0"], t["1"], t["2"], t["3"], t["4"], t["5"]))) + labs(x = "Time", y = expression(y = P(T > t["i"])), title = "Survival Function")
# simulate subscription data

Typing vagrant from the command line will display a list of all available commands.

Be sure that you are in the same directory as the Vagrantfile when running these commands!

Common Vagrant Commands

  • vagrant up -- starts vagrant environment (also provisions only on the FIRST vagrant up)
  • vagrant status -- outputs status of the vagrant machine
  • vagrant halt -- stops the vagrant machine
  • vagrant reload -- restarts vagrant machine, loads new Vagrantfile configuration
  • vagrant provision -- forces reprovisioning of the vagrant machine
This post examines the features of [R Markdown](http://www.rstudio.org/docs/authoring/using_markdown)
using [knitr](http://yihui.name/knitr/) in Rstudio 0.96.
This combination of tools provides an exciting improvement in usability for
[reproducible analysis](http://stats.stackexchange.com/a/15006/183).
Specifically, this post
(1) discusses getting started with R Markdown and `knitr` in Rstudio 0.96;
(2) provides a basic example of producing console output and plots using R Markdown;
(3) highlights several code chunk options such as caching and controlling how input and output is displayed;
(4) demonstrates use of standard Markdown notation as well as the extended features of formulas and tables; and
(5) discusses the implications of R Markdown.
This post examines the features of [R Markdown](http://www.rstudio.org/docs/authoring/using_markdown)
using [knitr](http://yihui.name/knitr/) in Rstudio 0.96.
This combination of tools provides an exciting improvement in usability for
[reproducible analysis](http://stats.stackexchange.com/a/15006/183).
Specifically, this post
(1) discusses getting started with R Markdown and `knitr` in Rstudio 0.96;
(2) provides a basic example of producing console output and plots using R Markdown;
(3) highlights several code chunk options such as caching and controlling how input and output is displayed;
(4) demonstrates use of standard Markdown notation as well as the extended features of formulas and tables; and
(5) discusses the implications of R Markdown.
hadoop fs -cat /Work/lon_text/lon_order_data_t/cdw320_lon_order_data_t.1.txt | head -100 | gzip > test.csv.gz
cat cdw320_lon_order_data_t.1.txt | head -100 | gzip > ../../tsnyder/cdw320_lon_order_data_t.1.txt.gz
hadoop fs -cat /Work/tsnyder/cdw320_lon_order_data_t.1.txt.gz | gunzip