Skip to content

Instantly share code, notes, and snippets.

df.write
.format("parquet")
.mode(SaveMode.Append)
.option("compression", "snappy")
.insertInto(table)
@tomduhourq
tomduhourq / loggerspark.scala
Created December 24, 2018 20:07
Log4J Logger for Spark
@transient private lazy val logger: Logger = LogManager.getRootLogger
@tomduhourq
tomduhourq / SparkMapPartitionsHttpClient.scala
Created December 24, 2018 20:05
Example of REST calls with mappartitions
val serviceResponsesRDD: RDD[List[ServiceResponse]] = requestsRDD.mapPartitions{ requestIterator =>
val requests = requestIterator.toList
logger.info(s"[START] Query service with ${requests.length} requests")
// Provides a CloseableHttpClient with back off should the service fail
val httpClient = HttpClientBuilder.withBackoff()
val responsesToAwait = requests.map(request => Service.call(request, httpClient))
val sequenceResponses = Future.sequence(responsesToAwait)
val responses = Await.result(sequenceResponses, 2 minutes)
@tomduhourq
tomduhourq / ElasticSearch.scala
Created December 24, 2018 19:57
ElasticSearchHelper
import org.apache.spark.sql.{DataFrame, SparkSession}
object ElasticSearch extends Serializable {
def query(spark: SparkSession, query: String, index: String): DataFrame =
spark.read.format("es")
.option("es.nodes", Configuration.elasticSearch.host)
.option("es.port", Configuration.elasticSearch.port)
.option("es.nodes.wan.only", "true")
.option("pushdown", "true")
@tomduhourq
tomduhourq / build.sbt
Created December 24, 2018 19:51
SBT with elasticsearch, postgresql, http client....
name := "etl-store-availability"
version := "0.1"
scalaVersion := "2.11.8"
libraryDependencies ++= Seq(
// Spark
"org.apache.spark" %% "spark-core" % "2.3.2" % "provided",
"org.apache.spark" %% "spark-sql" % "2.3.2" % "provided",
@tomduhourq
tomduhourq / create_boto_cli.py
Last active December 24, 2018 20:16
Create boto session and client
import boto3
def create_session(profile_name, region_name):
return boto3.session.Session(profile_name=profile_name, region_name=region_name)
def create_client(profile_name, region_name, service_name):
session = create_session(profile_name, region_name)
return session.client(service_name)
@tomduhourq
tomduhourq / aws-boto-s3-download-directory.py
Created September 6, 2017 18:12 — forked from freewayz/aws-boto-s3-download-directory.py
Download files and folder from amazon s3 using boto and pytho local system
#!/usr/bin/env python
import boto
import sys, os
from boto.s3.key import Key
from boto.exception import S3ResponseError
DOWNLOAD_LOCATION_PATH = os.path.expanduser("~") + "/s3-backup/"
if not os.path.exists(DOWNLOAD_LOCATION_PATH):
@tomduhourq
tomduhourq / create_emr_cluster.py
Last active August 30, 2017 15:45
Create your cluster with boto (transient)
import boto3
ACCESS_KEY_ID = "xxxxxxxx"
SECRET_ACCESS_KEY = "xxxxxxxx"
client = boto3.client('emr',
region_name='region',
aws_access_key_id=ACCESS_KEY_ID,
aws_secret_access_key=SECRET_ACCESS_KEY)

Advanced Functional Programming with Scala - Notes

Copyright © 2017 Fantasyland Institute of Learning. All rights reserved.

1. Mastering Functions

A function is a mapping from one set, called a domain, to another set, called the codomain. A function associates every element in the domain with exactly one element in the codomain. In Scala, both domain and codomain are types.

val square : Int => Int = x => x * x
import scala.io.Source
val lines = Source.fromFile("your-path-here").getLines