Skip to content

Instantly share code, notes, and snippets.

@lemanuel
Forked from jaceklaskowski/sparkathon-agenda.md
Created January 25, 2017 10:01
Show Gist options
  • Select an option

  • Save lemanuel/97deb534991ca8eff415b4e1ab4221ef to your computer and use it in GitHub Desktop.

Select an option

Save lemanuel/97deb534991ca8eff415b4e1ab4221ef to your computer and use it in GitHub Desktop.

Revisions

  1. @jaceklaskowski jaceklaskowski revised this gist Sep 28, 2016. 1 changed file with 6 additions and 6 deletions.
    12 changes: 6 additions & 6 deletions sparkathon-agenda.md
    Original file line number Diff line number Diff line change
    @@ -9,6 +9,12 @@

    ## Spark SQL

    8. Creating custom [Encoder](http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.Encoder)
    * [SPARK-17668 Support representing structs with case classes and tuples in spark sql udf inputs](https://issues.apache.org/jira/browse/SPARK-17668)
    * Create an encoder between your custom domain object of type `T` and JSON or CSV
    * See [Encoders](http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.Encoders$) for available encoders.
    * Read [Encoders - Internal Row Converters](https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-sql-Encoder.html)
    * (advanced/integration) Create an encoder for [Apache Arrow](https://arrow.apache.org/) (esp. after the [arrow-0.1.0 RC0](http://mail-archives.apache.org/mod_mbox/arrow-dev/201609.mbox/%3CCAO%2Bvc4BCBFY_3ZoASQ9UcMjOX_OjDg2nE9rTCoC3G5CiKqUC1w%40mail.gmail.com%3E) release candidate has recently been announced) and [ARROW-288 Implement Arrow adapter for Spark Datasets](https://issues.apache.org/jira/browse/ARROW-288).
    1. Custom format, i.e. `spark.read.format(...)` or `spark.write.format(...)`
    2. Multiline JSON reader / writer
    2. `SQLQueryTestSuite` - this is a very fresh thing in Spark 2.0 to write tests for Spark SQL
    @@ -18,12 +24,6 @@
    7. (done) Developing a custom [RuleExecutor](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/rules/RuleExecutor.scala#L46) and enabling it in Spark
    * Answering [Extending Spark Catalyst optimizer with own rules](http://stackoverflow.com/q/36152173/1305344) on StackOverflow
    * [Sparkathon - Developing Spark Extensions in Scala](http://www.meetup.com/WarsawScala/events/234156519/) on Sep 28th
    8. Creating custom [Encoder](http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.Encoder)
    * [SPARK-17668 Support representing structs with case classes and tuples in spark sql udf inputs](https://issues.apache.org/jira/browse/SPARK-17668)
    * Create an encoder between your custom domain object of type `T` and JSON or CSV
    * See [Encoders](http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.Encoders$) for available encoders.
    * Read [Encoders - Internal Row Converters](https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-sql-Encoder.html)
    * (advanced/integration) Create an encoder for [Apache Arrow](https://arrow.apache.org/) (esp. after the [arrow-0.1.0 RC0](http://mail-archives.apache.org/mod_mbox/arrow-dev/201609.mbox/%3CCAO%2Bvc4BCBFY_3ZoASQ9UcMjOX_OjDg2nE9rTCoC3G5CiKqUC1w%40mail.gmail.com%3E) release candidate has recently been announced) and [ARROW-288 Implement Arrow adapter for Spark Datasets](https://issues.apache.org/jira/browse/ARROW-288).

    ## Spark MLlib

  2. @jaceklaskowski jaceklaskowski revised this gist Sep 28, 2016. 1 changed file with 3 additions and 2 deletions.
    5 changes: 3 additions & 2 deletions sparkathon-agenda.md
    Original file line number Diff line number Diff line change
    @@ -15,8 +15,9 @@
    * [Changelog](https://github.com/apache/spark/commits/master/sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala)
    4. http://stackoverflow.com/questions/39073602/i-am-running-gbt-in-spark-ml-for-ctr-prediction-i-am-getting-exception-because
    6. [ExecutionListenerManager](http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.util.ExecutionListenerManager)
    7. Developing a custom [RuleExecutor](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/rules/RuleExecutor.scala#L46) and enabling it in Spark
    * Answering [Extending Spark Catalyst optimizer with own rules](http://stackoverflow.com/q/36152173/1305344) on StackOverflow
    7. (done) Developing a custom [RuleExecutor](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/rules/RuleExecutor.scala#L46) and enabling it in Spark
    * Answering [Extending Spark Catalyst optimizer with own rules](http://stackoverflow.com/q/36152173/1305344) on StackOverflow
    * [Sparkathon - Developing Spark Extensions in Scala](http://www.meetup.com/WarsawScala/events/234156519/) on Sep 28th
    8. Creating custom [Encoder](http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.Encoder)
    * [SPARK-17668 Support representing structs with case classes and tuples in spark sql udf inputs](https://issues.apache.org/jira/browse/SPARK-17668)
    * Create an encoder between your custom domain object of type `T` and JSON or CSV
  3. @jaceklaskowski jaceklaskowski revised this gist Sep 28, 2016. 1 changed file with 1 addition and 0 deletions.
    1 change: 1 addition & 0 deletions sparkathon-agenda.md
    Original file line number Diff line number Diff line change
    @@ -18,6 +18,7 @@
    7. Developing a custom [RuleExecutor](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/rules/RuleExecutor.scala#L46) and enabling it in Spark
    * Answering [Extending Spark Catalyst optimizer with own rules](http://stackoverflow.com/q/36152173/1305344) on StackOverflow
    8. Creating custom [Encoder](http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.Encoder)
    * [SPARK-17668 Support representing structs with case classes and tuples in spark sql udf inputs](https://issues.apache.org/jira/browse/SPARK-17668)
    * Create an encoder between your custom domain object of type `T` and JSON or CSV
    * See [Encoders](http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.Encoders$) for available encoders.
    * Read [Encoders - Internal Row Converters](https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-sql-Encoder.html)
  4. @jaceklaskowski jaceklaskowski revised this gist Sep 23, 2016. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion sparkathon-agenda.md
    Original file line number Diff line number Diff line change
    @@ -13,10 +13,10 @@
    2. Multiline JSON reader / writer
    2. `SQLQueryTestSuite` - this is a very fresh thing in Spark 2.0 to write tests for Spark SQL
    * [Changelog](https://github.com/apache/spark/commits/master/sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala)
    * Filipe
    4. http://stackoverflow.com/questions/39073602/i-am-running-gbt-in-spark-ml-for-ctr-prediction-i-am-getting-exception-because
    6. [ExecutionListenerManager](http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.util.ExecutionListenerManager)
    7. Developing a custom [RuleExecutor](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/rules/RuleExecutor.scala#L46) and enabling it in Spark
    * Answering [Extending Spark Catalyst optimizer with own rules](http://stackoverflow.com/q/36152173/1305344) on StackOverflow
    8. Creating custom [Encoder](http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.Encoder)
    * Create an encoder between your custom domain object of type `T` and JSON or CSV
    * See [Encoders](http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.Encoders$) for available encoders.
  5. @jaceklaskowski jaceklaskowski revised this gist Sep 22, 2016. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion sparkathon-agenda.md
    Original file line number Diff line number Diff line change
    @@ -21,7 +21,7 @@
    * Create an encoder between your custom domain object of type `T` and JSON or CSV
    * See [Encoders](http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.Encoders$) for available encoders.
    * Read [Encoders - Internal Row Converters](https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-sql-Encoder.html)
    * (advanced) Create an encoder for [Apache Arrow](https://arrow.apache.org/) (esp. after the [arrow-0.1.0 RC0](http://mail-archives.apache.org/mod_mbox/arrow-dev/201609.mbox/%3CCAO%2Bvc4BCBFY_3ZoASQ9UcMjOX_OjDg2nE9rTCoC3G5CiKqUC1w%40mail.gmail.com%3E) release candidate has recently been announced).
    * (advanced/integration) Create an encoder for [Apache Arrow](https://arrow.apache.org/) (esp. after the [arrow-0.1.0 RC0](http://mail-archives.apache.org/mod_mbox/arrow-dev/201609.mbox/%3CCAO%2Bvc4BCBFY_3ZoASQ9UcMjOX_OjDg2nE9rTCoC3G5CiKqUC1w%40mail.gmail.com%3E) release candidate has recently been announced) and [ARROW-288 Implement Arrow adapter for Spark Datasets](https://issues.apache.org/jira/browse/ARROW-288).

    ## Spark MLlib

  6. @jaceklaskowski jaceklaskowski revised this gist Sep 22, 2016. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion sparkathon-agenda.md
    Original file line number Diff line number Diff line change
    @@ -21,7 +21,7 @@
    * Create an encoder between your custom domain object of type `T` and JSON or CSV
    * See [Encoders](http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.Encoders$) for available encoders.
    * Read [Encoders - Internal Row Converters](https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-sql-Encoder.html)
    * (advanced) Create an encoder for [Apache Arrow](https://arrow.apache.org/) (esp. after the [arrow-0.1.0 RC0](http://mail-archives.apache.org/mod_mbox/arrow-dev/201609.mbox/browser) release candidate has recently been announced).
    * (advanced) Create an encoder for [Apache Arrow](https://arrow.apache.org/) (esp. after the [arrow-0.1.0 RC0](http://mail-archives.apache.org/mod_mbox/arrow-dev/201609.mbox/%3CCAO%2Bvc4BCBFY_3ZoASQ9UcMjOX_OjDg2nE9rTCoC3G5CiKqUC1w%40mail.gmail.com%3E) release candidate has recently been announced).

    ## Spark MLlib

  7. @jaceklaskowski jaceklaskowski revised this gist Sep 22, 2016. 1 changed file with 3 additions and 0 deletions.
    3 changes: 3 additions & 0 deletions sparkathon-agenda.md
    Original file line number Diff line number Diff line change
    @@ -18,7 +18,10 @@
    6. [ExecutionListenerManager](http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.util.ExecutionListenerManager)
    7. Developing a custom [RuleExecutor](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/rules/RuleExecutor.scala#L46) and enabling it in Spark
    8. Creating custom [Encoder](http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.Encoder)
    * Create an encoder between your custom domain object of type `T` and JSON or CSV
    * See [Encoders](http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.Encoders$) for available encoders.
    * Read [Encoders - Internal Row Converters](https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-sql-Encoder.html)
    * (advanced) Create an encoder for [Apache Arrow](https://arrow.apache.org/) (esp. after the [arrow-0.1.0 RC0](http://mail-archives.apache.org/mod_mbox/arrow-dev/201609.mbox/browser) release candidate has recently been announced).

    ## Spark MLlib

  8. @jaceklaskowski jaceklaskowski revised this gist Sep 16, 2016. 1 changed file with 3 additions and 1 deletion.
    4 changes: 3 additions & 1 deletion sparkathon-agenda.md
    Original file line number Diff line number Diff line change
    @@ -34,7 +34,9 @@

    ## Misc

    1. Develop a new Scala-only Kafka client
    1. Develop a new Scala-only TCP-based [Apache Kafka](http://kafka.apache.org/) client
    * [A Guide To The Kafka Protocol](https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol)
    * [KAFKA-3360 Add a protocol page/section to the official Kafka documentation](https://issues.apache.org/jira/browse/KAFKA-3360)
    * See [Scala Kafka Client](https://github.com/cakesolutions/scala-kafka-client) for inspiration yet it's just _"a thin Scala wrapper over the official Apache Kafka Java Driver"_
    9. Working on Issues reported in [TensorFrames](https://github.com/databricks/tensorframes/issues).
    10. Review open issues in [Spark's JIRA](https://issues.apache.org/jira/browse/SPARK-17375?jql=project%20%3D%20SPARK%20AND%20status%20%3D%20Open) and pick one to work on.
  9. @jaceklaskowski jaceklaskowski revised this gist Sep 15, 2016. 1 changed file with 2 additions and 0 deletions.
    2 changes: 2 additions & 0 deletions sparkathon-agenda.md
    Original file line number Diff line number Diff line change
    @@ -34,5 +34,7 @@

    ## Misc

    1. Develop a new Scala-only Kafka client
    * See [Scala Kafka Client](https://github.com/cakesolutions/scala-kafka-client) for inspiration yet it's just _"a thin Scala wrapper over the official Apache Kafka Java Driver"_
    9. Working on Issues reported in [TensorFrames](https://github.com/databricks/tensorframes/issues).
    10. Review open issues in [Spark's JIRA](https://issues.apache.org/jira/browse/SPARK-17375?jql=project%20%3D%20SPARK%20AND%20status%20%3D%20Open) and pick one to work on.
  10. @jaceklaskowski jaceklaskowski revised this gist Sep 14, 2016. 1 changed file with 2 additions and 2 deletions.
    4 changes: 2 additions & 2 deletions sparkathon-agenda.md
    Original file line number Diff line number Diff line change
    @@ -30,9 +30,9 @@

    ## Core

    1. Monitoring executors (metrics, e.g. memory usage) using [SparkListener.onExecutorMetricsUpdate](http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.scheduler.SparkListener@onExecutorMetricsUpdate(executorMetricsUpdate:org.apache.spark.scheduler.SparkListenerExecutorMetricsUpdate):Unit)
    1. Monitoring executors (metrics, e.g. memory usage) using [SparkListener.onExecutorMetricsUpdate](http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.scheduler.SparkListener@onExecutorMetricsUpdate(executorMetricsUpdate:org.apache.spark.scheduler.SparkListenerExecutorMetricsUpdate):Unit).

    ## Misc

    9. Working on Issues reported in [TensorFrames](https://github.com/databricks/tensorframes/issues)
    9. Working on Issues reported in [TensorFrames](https://github.com/databricks/tensorframes/issues).
    10. Review open issues in [Spark's JIRA](https://issues.apache.org/jira/browse/SPARK-17375?jql=project%20%3D%20SPARK%20AND%20status%20%3D%20Open) and pick one to work on.
  11. @jaceklaskowski jaceklaskowski revised this gist Sep 14, 2016. 1 changed file with 4 additions and 0 deletions.
    4 changes: 4 additions & 0 deletions sparkathon-agenda.md
    Original file line number Diff line number Diff line change
    @@ -28,6 +28,10 @@
    * Problem to zapis Pipeline z tym Transformera, odczyt i użycie.
    8. Spark MLlib 2.0 Activator

    ## Core

    1. Monitoring executors (metrics, e.g. memory usage) using [SparkListener.onExecutorMetricsUpdate](http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.scheduler.SparkListener@onExecutorMetricsUpdate(executorMetricsUpdate:org.apache.spark.scheduler.SparkListenerExecutorMetricsUpdate):Unit)

    ## Misc

    9. Working on Issues reported in [TensorFrames](https://github.com/databricks/tensorframes/issues)
  12. @jaceklaskowski jaceklaskowski revised this gist Sep 12, 2016. 1 changed file with 2 additions and 2 deletions.
    4 changes: 2 additions & 2 deletions sparkathon-agenda.md
    Original file line number Diff line number Diff line change
    @@ -4,8 +4,8 @@

    1. Developing a custom [StreamSourceProvider](http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.sources.StreamSourceProvider)
    2. Migrating TextSocketStream to SparkSession (currently uses SQLContext)
    3. [Apache Kafka](http://kafka.apache.org/) support
    4. JDBC support
    3. Developing Sink and Source for [Apache Kafka](http://kafka.apache.org/)
    4. JDBC support (with PostgreSQL as the database)

    ## Spark SQL

  13. @jaceklaskowski jaceklaskowski revised this gist Sep 8, 2016. 1 changed file with 1 addition and 0 deletions.
    1 change: 1 addition & 0 deletions sparkathon-agenda.md
    Original file line number Diff line number Diff line change
    @@ -10,6 +10,7 @@
    ## Spark SQL

    1. Custom format, i.e. `spark.read.format(...)` or `spark.write.format(...)`
    2. Multiline JSON reader / writer
    2. `SQLQueryTestSuite` - this is a very fresh thing in Spark 2.0 to write tests for Spark SQL
    * [Changelog](https://github.com/apache/spark/commits/master/sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala)
    * Filipe
  14. @jaceklaskowski jaceklaskowski revised this gist Sep 6, 2016. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion sparkathon-agenda.md
    Original file line number Diff line number Diff line change
    @@ -16,7 +16,7 @@
    4. http://stackoverflow.com/questions/39073602/i-am-running-gbt-in-spark-ml-for-ctr-prediction-i-am-getting-exception-because
    6. [ExecutionListenerManager](http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.util.ExecutionListenerManager)
    7. Developing a custom [RuleExecutor](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/rules/RuleExecutor.scala#L46) and enabling it in Spark
    8. Creating custom [Encoder](http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.Encoder).
    8. Creating custom [Encoder](http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.Encoder)
    * See [Encoders](http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.Encoders$) for available encoders.

    ## Spark MLlib
  15. @jaceklaskowski jaceklaskowski revised this gist Sep 6, 2016. 1 changed file with 2 additions and 0 deletions.
    2 changes: 2 additions & 0 deletions sparkathon-agenda.md
    Original file line number Diff line number Diff line change
    @@ -16,6 +16,8 @@
    4. http://stackoverflow.com/questions/39073602/i-am-running-gbt-in-spark-ml-for-ctr-prediction-i-am-getting-exception-because
    6. [ExecutionListenerManager](http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.util.ExecutionListenerManager)
    7. Developing a custom [RuleExecutor](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/rules/RuleExecutor.scala#L46) and enabling it in Spark
    8. Creating custom [Encoder](http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.Encoder).
    * See [Encoders](http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.Encoders$) for available encoders.

    ## Spark MLlib

  16. @jaceklaskowski jaceklaskowski revised this gist Sep 6, 2016. 1 changed file with 2 additions and 0 deletions.
    2 changes: 2 additions & 0 deletions sparkathon-agenda.md
    Original file line number Diff line number Diff line change
    @@ -4,6 +4,8 @@

    1. Developing a custom [StreamSourceProvider](http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.sources.StreamSourceProvider)
    2. Migrating TextSocketStream to SparkSession (currently uses SQLContext)
    3. [Apache Kafka](http://kafka.apache.org/) support
    4. JDBC support

    ## Spark SQL

  17. @jaceklaskowski jaceklaskowski revised this gist Sep 6, 2016. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion sparkathon-agenda.md
    Original file line number Diff line number Diff line change
    @@ -23,7 +23,7 @@
    * Problem to zapis Pipeline z tym Transformera, odczyt i użycie.
    8. Spark MLlib 2.0 Activator

    ### Misc
    ## Misc

    9. Working on Issues reported in [TensorFrames](https://github.com/databricks/tensorframes/issues)
    10. Review open issues in [Spark's JIRA](https://issues.apache.org/jira/browse/SPARK-17375?jql=project%20%3D%20SPARK%20AND%20status%20%3D%20Open) and pick one to work on.
  18. @jaceklaskowski jaceklaskowski revised this gist Sep 6, 2016. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion sparkathon-agenda.md
    Original file line number Diff line number Diff line change
    @@ -1,4 +1,4 @@
    # Spark-a-thon -- Development Activities
    # Spark-a-thon - Development Activities

    ## Structured Streaming

  19. @jaceklaskowski jaceklaskowski revised this gist Sep 6, 2016. 1 changed file with 4 additions and 6 deletions.
    10 changes: 4 additions & 6 deletions sparkathon-agenda.md
    Original file line number Diff line number Diff line change
    @@ -1,13 +1,11 @@
    # Spark-a-thon
    # Spark-a-thon -- Development Activities

    ## Topics

    ### Structured Streaming
    ## Structured Streaming

    1. Developing a custom [StreamSourceProvider](http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.sources.StreamSourceProvider)
    2. Migrating TextSocketStream to SparkSession (currently uses SQLContext)

    ### Spark SQL
    ## Spark SQL

    1. Custom format, i.e. `spark.read.format(...)` or `spark.write.format(...)`
    2. `SQLQueryTestSuite` - this is a very fresh thing in Spark 2.0 to write tests for Spark SQL
    @@ -17,7 +15,7 @@
    6. [ExecutionListenerManager](http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.util.ExecutionListenerManager)
    7. Developing a custom [RuleExecutor](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/rules/RuleExecutor.scala#L46) and enabling it in Spark

    ### Spark MLlib
    ## Spark MLlib

    5. Creating custom Transformer
    * Example: [Tokenizer](http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.ml.feature.Tokenizer)
  20. @jaceklaskowski jaceklaskowski revised this gist Sep 6, 2016. 1 changed file with 17 additions and 9 deletions.
    26 changes: 17 additions & 9 deletions sparkathon-agenda.md
    Original file line number Diff line number Diff line change
    @@ -1,23 +1,31 @@
    # Spark-a-thon

    ## Agenda Proposal
    ## Topics

    1. (Structured Streaming) Developing a custom [StreamSourceProvider](http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.sources.StreamSourceProvider)
    2. (Structured Streaming) Migrating TextSocketStream to SparkSession (currently uses SQLContext)
    1. (Spark SQL) Custom MF format
    ### Structured Streaming

    1. Developing a custom [StreamSourceProvider](http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.sources.StreamSourceProvider)
    2. Migrating TextSocketStream to SparkSession (currently uses SQLContext)

    ### Spark SQL

    1. Custom format, i.e. `spark.read.format(...)` or `spark.write.format(...)`
    2. `SQLQueryTestSuite` - this is a very fresh thing in Spark 2.0 to write tests for Spark SQL
    * [Changelog](https://github.com/apache/spark/commits/master/sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala)
    * Filipe
    3. https://issues.apache.org/jira/browse/SPARK-17156
    * Jacek
    * [The complete working example in Scala (with sbt)](https://github.com/jaceklaskowski/spark-workshop/tree/master/solutions/multinomial-logistic-regression)
    4. http://stackoverflow.com/questions/39073602/i-am-running-gbt-in-spark-ml-for-ctr-prediction-i-am-getting-exception-because
    6. [ExecutionListenerManager](http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.util.ExecutionListenerManager)
    7. Developing a custom [RuleExecutor](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/rules/RuleExecutor.scala#L46) and enabling it in Spark

    ### Spark MLlib

    5. Creating custom Transformer
    * Example: [Tokenizer](http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.ml.feature.Tokenizer)
    * Jonatan + Kuba + lejdis (Justyna + Magda)
    * Problem to zapis Pipeline z tym Transformera, odczyt i użycie.
    6. [ExecutionListenerManager](http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.util.ExecutionListenerManager)
    7. Developing a custom [RuleExecutor](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/rules/RuleExecutor.scala#L46) and enabling it in Spark
    8. Spark MLlib 2.0 Activator

    ### Misc

    9. Working on Issues reported in [TensorFrames](https://github.com/databricks/tensorframes/issues)
    10. Review open issues in [Spark's JIRA](https://issues.apache.org/jira/browse/SPARK-17375?jql=project%20%3D%20SPARK%20AND%20status%20%3D%20Open) and pick one to work on.
  21. @jaceklaskowski jaceklaskowski revised this gist Sep 6, 2016. 1 changed file with 3 additions and 1 deletion.
    4 changes: 3 additions & 1 deletion sparkathon-agenda.md
    Original file line number Diff line number Diff line change
    @@ -2,7 +2,9 @@

    ## Agenda Proposal

    1. Custom MF format
    1. (Structured Streaming) Developing a custom [StreamSourceProvider](http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.sources.StreamSourceProvider)
    2. (Structured Streaming) Migrating TextSocketStream to SparkSession (currently uses SQLContext)
    1. (Spark SQL) Custom MF format
    2. `SQLQueryTestSuite` - this is a very fresh thing in Spark 2.0 to write tests for Spark SQL
    * [Changelog](https://github.com/apache/spark/commits/master/sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala)
    * Filipe
  22. @jaceklaskowski jaceklaskowski renamed this gist Sep 2, 2016. 1 changed file with 7 additions and 4 deletions.
    11 changes: 7 additions & 4 deletions agenda.md → sparkathon-agenda.md
    Original file line number Diff line number Diff line change
    @@ -1,3 +1,7 @@
    # Spark-a-thon

    ## Agenda Proposal

    1. Custom MF format
    2. `SQLQueryTestSuite` - this is a very fresh thing in Spark 2.0 to write tests for Spark SQL
    * [Changelog](https://github.com/apache/spark/commits/master/sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala)
    @@ -11,8 +15,7 @@
    * Jonatan + Kuba + lejdis (Justyna + Magda)
    * Problem to zapis Pipeline z tym Transformera, odczyt i użycie.
    6. [ExecutionListenerManager](http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.util.ExecutionListenerManager)
    7. Developing RuleExecutor
    7. Developing a custom [RuleExecutor](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/rules/RuleExecutor.scala#L46) and enabling it in Spark
    8. Spark MLlib 2.0 Activator
    9. TensorFlow

    Mateusz bez krawata...myśli
    9. Working on Issues reported in [TensorFrames](https://github.com/databricks/tensorframes/issues)
    10. Review open issues in [Spark's JIRA](https://issues.apache.org/jira/browse/SPARK-17375?jql=project%20%3D%20SPARK%20AND%20status%20%3D%20Open) and pick one to work on.
  23. @jaceklaskowski jaceklaskowski revised this gist Aug 24, 2016. 1 changed file with 1 addition and 0 deletions.
    1 change: 1 addition & 0 deletions agenda.md
    Original file line number Diff line number Diff line change
    @@ -4,6 +4,7 @@
    * Filipe
    3. https://issues.apache.org/jira/browse/SPARK-17156
    * Jacek
    * [The complete working example in Scala (with sbt)](https://github.com/jaceklaskowski/spark-workshop/tree/master/solutions/multinomial-logistic-regression)
    4. http://stackoverflow.com/questions/39073602/i-am-running-gbt-in-spark-ml-for-ctr-prediction-i-am-getting-exception-because
    5. Creating custom Transformer
    * Example: [Tokenizer](http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.ml.feature.Tokenizer)
  24. @jaceklaskowski jaceklaskowski revised this gist Aug 24, 2016. 1 changed file with 1 addition and 0 deletions.
    1 change: 1 addition & 0 deletions agenda.md
    Original file line number Diff line number Diff line change
    @@ -12,5 +12,6 @@
    6. [ExecutionListenerManager](http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.util.ExecutionListenerManager)
    7. Developing RuleExecutor
    8. Spark MLlib 2.0 Activator
    9. TensorFlow

    Mateusz bez krawata...myśli
  25. @jaceklaskowski jaceklaskowski revised this gist Aug 24, 2016. 1 changed file with 1 addition and 0 deletions.
    1 change: 1 addition & 0 deletions agenda.md
    Original file line number Diff line number Diff line change
    @@ -11,5 +11,6 @@
    * Problem to zapis Pipeline z tym Transformera, odczyt i użycie.
    6. [ExecutionListenerManager](http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.util.ExecutionListenerManager)
    7. Developing RuleExecutor
    8. Spark MLlib 2.0 Activator

    Mateusz bez krawata...myśli
  26. @jaceklaskowski jaceklaskowski revised this gist Aug 24, 2016. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion agenda.md
    Original file line number Diff line number Diff line change
    @@ -7,7 +7,7 @@
    4. http://stackoverflow.com/questions/39073602/i-am-running-gbt-in-spark-ml-for-ctr-prediction-i-am-getting-exception-because
    5. Creating custom Transformer
    * Example: [Tokenizer](http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.ml.feature.Tokenizer)
    * Jonatan + Kuba + lejdis
    * Jonatan + Kuba + lejdis (Justyna + Magda)
    * Problem to zapis Pipeline z tym Transformera, odczyt i użycie.
    6. [ExecutionListenerManager](http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.util.ExecutionListenerManager)
    7. Developing RuleExecutor
  27. @jaceklaskowski jaceklaskowski revised this gist Aug 24, 2016. 1 changed file with 2 additions and 1 deletion.
    3 changes: 2 additions & 1 deletion agenda.md
    Original file line number Diff line number Diff line change
    @@ -1,5 +1,6 @@
    1. Custom MF format
    2. SQLQueryTestSuite - see gmail
    2. `SQLQueryTestSuite` - this is a very fresh thing in Spark 2.0 to write tests for Spark SQL
    * [Changelog](https://github.com/apache/spark/commits/master/sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala)
    * Filipe
    3. https://issues.apache.org/jira/browse/SPARK-17156
    * Jacek
  28. @jaceklaskowski jaceklaskowski revised this gist Aug 24, 2016. 1 changed file with 1 addition and 0 deletions.
    1 change: 1 addition & 0 deletions agenda.md
    Original file line number Diff line number Diff line change
    @@ -7,6 +7,7 @@
    5. Creating custom Transformer
    * Example: [Tokenizer](http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.ml.feature.Tokenizer)
    * Jonatan + Kuba + lejdis
    * Problem to zapis Pipeline z tym Transformera, odczyt i użycie.
    6. [ExecutionListenerManager](http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.util.ExecutionListenerManager)
    7. Developing RuleExecutor

  29. @jaceklaskowski jaceklaskowski revised this gist Aug 24, 2016. 1 changed file with 0 additions and 1 deletion.
    1 change: 0 additions & 1 deletion agenda.md
    Original file line number Diff line number Diff line change
    @@ -1,5 +1,4 @@
    1. Custom MF format

    2. SQLQueryTestSuite - see gmail
    * Filipe
    3. https://issues.apache.org/jira/browse/SPARK-17156
  30. @jaceklaskowski jaceklaskowski revised this gist Aug 24, 2016. 1 changed file with 4 additions and 4 deletions.
    8 changes: 4 additions & 4 deletions agenda.md
    Original file line number Diff line number Diff line change
    @@ -1,13 +1,13 @@
    1. Custom MF format

    2. SQLQueryTestSuite - see gmail
    * Filipe
    * Filipe
    3. https://issues.apache.org/jira/browse/SPARK-17156
    * Jacek
    * Jacek
    4. http://stackoverflow.com/questions/39073602/i-am-running-gbt-in-spark-ml-for-ctr-prediction-i-am-getting-exception-because
    5. Creating custom Transformer
    * Example: [Tokenizer](http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.ml.feature.Tokenizer)
    * Jonatan + Kuba + lejdis
    * Example: [Tokenizer](http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.ml.feature.Tokenizer)
    * Jonatan + Kuba + lejdis
    6. [ExecutionListenerManager](http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.util.ExecutionListenerManager)
    7. Developing RuleExecutor