abderrahim.boussetta aboussetta

This is a set of instructions for use with the blog article Streaming data from Oracle using Oracle GoldenGate and Kafka Connect.

@rmoff / September 15, 2016

First up, download the BigDataLite VM, unpack it and import it to VirtualBox. You'll need internet access from the VM for the downloads, so make sure you include a NAT network adaptor, or bridged onto a network with internet access. Login to the machine as oracle/welcome1. All the work done in this article is from the command line, so you can either work in Terminal, or you can run ip a to determine the IP address of the VM and then SSH into it from your host machine.

Install Confluent Platform

Quick List of Resources for Topological Data Analysis with Emphasis on Machine Learning

This is just a quick list of resourses on TDA that I put together for @rickasaurus after he was asking for links to papers, books, etc on Twitter and is by no means an exhaustive list.

Survey Papers

Both Carlsson's and Ghrist's survey papers offer a very good introduction to the subject

Topology and Data by Gunnar Carlsson
Barcodes: The Persistent Topology of Data by Robert Ghrist

Other Papers and Web Resources

Extracting insights from the shape of complex data using topology A good introductory paper in Nature on the Mapper algorithm.

Installing ELK (CentOS)

This is a short step-by-step guide on installing ElasticSearch LogStash and Kibana Stack on a CentOS environment to gather and analyze logs.

I. Install JDK

rpm -ivh https://dl.dropboxusercontent.com/u/5756075/jdk-7u45-linux-x64.rpm

Install, Setup, and Test Spark and Cassandra on Mac OS X

This Gist assumes you already followed the instructions to install Cassandra, created a keyspace and table, and added some data.

Install Apache Spark

brew install apache-spark

	from luigi.format import Format
	import csvkit

	class CSVOutputProcessor(object):
	"""
	A simple CSV output processor to be hooked into Format's
	`pipe_writer`.

	If `cols` are given, the names are used as CSV header, otherwise no
	explicit header is written.

	# run with a custom --n
	# python run_luigi.py SquaredNumbers --local-scheduler --n 20

	import luigi

	class PrintNumbers(luigi.Task):
	n = luigi.IntParameter(default=10)

	def requires(self):
	return []

	#!/bin/sh
	parse_yaml() {
	local prefix=$2
	local s='[[:space:]]' w='[a-zA-Z0-9_]' fs=$(echo @\|tr @ '\034')
	sed -ne "s\|^\($s\)\($w\)$s:$s\"\(.*\)\"$s\$\|\1$fs\2$fs\3\|p" \
	-e "s\|^\($s\)\($w\)$s:$s\(.*\)$s\$\|\1$fs\2$fs\3\|p" $1 \|
	awk -F$fs '{
	indent = length($1)/2;
	vname[indent] = $2;
	for (i in vname) {if (i > indent) {delete vname[i]}}

	# -- mode: ruby --
	# vi: set ft=ruby :

	$script = <<SCRIPT
	sudo yum -y install ntp
	sudo chkconfig ntpd on
	sudo /etc/init.d/ntpd start
	sudo chkconfig iptables off
	sudo /etc/init.d/iptables stop
	sudo setenforce 0

	#!/bin/bash
	# Installs the spark-cassandra-connector and support libs

	mkdir /opt/connector
	cd /opt/connector

	rm *.jar

	curl -o ivy-2.3.0.jar \
	'http://search.maven.org/remotecontent?filepath=org/apache/ivy/ivy/2.3.0/ivy-2.3.0.jar'