Skip to content

Instantly share code, notes, and snippets.

@lemanuel
Forked from jaceklaskowski/sparkathon-agenda.md
Created January 25, 2017 10:01
Show Gist options
  • Select an option

  • Save lemanuel/97deb534991ca8eff415b4e1ab4221ef to your computer and use it in GitHub Desktop.

Select an option

Save lemanuel/97deb534991ca8eff415b4e1ab4221ef to your computer and use it in GitHub Desktop.
Sparkathon in Warsaw - Development Activities

Spark-a-thon - Development Activities

Structured Streaming

  1. Developing a custom StreamSourceProvider
  2. Migrating TextSocketStream to SparkSession (currently uses SQLContext)
  3. Developing Sink and Source for Apache Kafka
  4. JDBC support (with PostgreSQL as the database)

Spark SQL

  1. Creating custom Encoder
  2. Custom format, i.e. spark.read.format(...) or spark.write.format(...)
  3. Multiline JSON reader / writer
  4. SQLQueryTestSuite - this is a very fresh thing in Spark 2.0 to write tests for Spark SQL
  1. http://stackoverflow.com/questions/39073602/i-am-running-gbt-in-spark-ml-for-ctr-prediction-i-am-getting-exception-because
  2. ExecutionListenerManager
  3. (done) Developing a custom RuleExecutor and enabling it in Spark

Spark MLlib

  1. Creating custom Transformer
  • Example: Tokenizer
  • Jonatan + Kuba + lejdis (Justyna + Magda)
  • Problem to zapis Pipeline z tym Transformera, odczyt i użycie.
  1. Spark MLlib 2.0 Activator

Core

  1. Monitoring executors (metrics, e.g. memory usage) using SparkListener.onExecutorMetricsUpdate.

Misc

  1. Develop a new Scala-only TCP-based Apache Kafka client
  2. Working on Issues reported in TensorFrames.
  3. Review open issues in Spark's JIRA and pick one to work on.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment