Skip to content

Instantly share code, notes, and snippets.

View aboussetta's full-sized avatar

abderrahim.boussetta aboussetta

  • DBACENTER
  • London
View GitHub Profile
@aboussetta
aboussetta / csvprocessor.py
Created October 25, 2018 09:19 — forked from miku/csvprocessor.py
CSV processor examples for luigi. Can serialize *args to CSV. Can deserialize CSV rows into namedtuples if requested. -- "works on my machine".
from luigi.format import Format
import csvkit
class CSVOutputProcessor(object):
"""
A simple CSV output processor to be hooked into Format's
`pipe_writer`.
If `cols` are given, the names are used as CSV header, otherwise no
explicit header is written.
@aboussetta
aboussetta / run_luigi.py
Created September 3, 2018 13:44 — forked from bonzanini/run_luigi.py
Example of Luigi task pipeline
# run with a custom --n
# python run_luigi.py SquaredNumbers --local-scheduler --n 20
import luigi
class PrintNumbers(luigi.Task):
n = luigi.IntParameter(default=10)
def requires(self):
return []

This is a set of instructions for use with the blog article Streaming data from Oracle using Oracle GoldenGate and Kafka Connect.

@rmoff / September 15, 2016


First up, download the BigDataLite VM, unpack it and import it to VirtualBox. You'll need internet access from the VM for the downloads, so make sure you include a NAT network adaptor, or bridged onto a network with internet access. Login to the machine as oracle/welcome1. All the work done in this article is from the command line, so you can either work in Terminal, or you can run ip a to determine the IP address of the VM and then SSH into it from your host machine.

Install Confluent Platform

@aboussetta
aboussetta / parse_yaml.sh
Created May 29, 2018 17:18 — forked from pkuczynski/LICENSE
Read YAML file from Bash script
#!/bin/sh
parse_yaml() {
local prefix=$2
local s='[[:space:]]*' w='[a-zA-Z0-9_]*' fs=$(echo @|tr @ '\034')
sed -ne "s|^\($s\)\($w\)$s:$s\"\(.*\)\"$s\$|\1$fs\2$fs\3|p" \
-e "s|^\($s\)\($w\)$s:$s\(.*\)$s\$|\1$fs\2$fs\3|p" $1 |
awk -F$fs '{
indent = length($1)/2;
vname[indent] = $2;
for (i in vname) {if (i > indent) {delete vname[i]}}
@aboussetta
aboussetta / TDA_resources.md
Created April 3, 2018 11:04 — forked from calstad/TDA_resources.md
List of resources for TDA

Quick List of Resources for Topological Data Analysis with Emphasis on Machine Learning

This is just a quick list of resourses on TDA that I put together for @rickasaurus after he was asking for links to papers, books, etc on Twitter and is by no means an exhaustive list.

Survey Papers

Both Carlsson's and Ghrist's survey papers offer a very good introduction to the subject

Other Papers and Web Resources

@aboussetta
aboussetta / readme.md
Created February 21, 2018 13:10 — forked from ashrithr/readme.md
Installing ELK on a single machine

Installing ELK (CentOS)

This is a short step-by-step guide on installing ElasticSearch LogStash and Kibana Stack on a CentOS environment to gather and analyze logs.

I. Install JDK

rpm -ivh https://dl.dropboxusercontent.com/u/5756075/jdk-7u45-linux-x64.rpm
@aboussetta
aboussetta / hdp-vagrantfile.rb
Created August 16, 2017 22:53 — forked from uprush/hdp-vagrantfile.rb
Vagrantfile for a 4 nodes HDP cluster on Vagrant.
# -*- mode: ruby -*-
# vi: set ft=ruby :
$script = <<SCRIPT
sudo yum -y install ntp
sudo chkconfig ntpd on
sudo /etc/init.d/ntpd start
sudo chkconfig iptables off
sudo /etc/init.d/iptables stop
sudo setenforce 0
@aboussetta
aboussetta / install_connector.sh
Created May 28, 2017 15:24 — forked from hkhamm/install_connector.sh
Install the Spark Cassandra Connector
#!/bin/bash
# Installs the spark-cassandra-connector and support libs
mkdir /opt/connector
cd /opt/connector
rm *.jar
curl -o ivy-2.3.0.jar \
'http://search.maven.org/remotecontent?filepath=org/apache/ivy/ivy/2.3.0/ivy-2.3.0.jar'
@aboussetta
aboussetta / install_spark.md
Created May 28, 2017 12:44 — forked from hkhamm/install_spark.md
Install, Setup, and Test Spark and Cassandra on Mac OS X

Install, Setup, and Test Spark and Cassandra on Mac OS X

This Gist assumes you already followed the instructions to install Cassandra, created a keyspace and table, and added some data.

Install Apache Spark

brew install apache-spark