Skip to content

Instantly share code, notes, and snippets.

@smiklos
Created June 21, 2018 18:51
Show Gist options
  • Save smiklos/d174d25ddafdd74fc1aa204481dd2ac3 to your computer and use it in GitHub Desktop.
Save smiklos/d174d25ddafdd74fc1aa204481dd2ac3 to your computer and use it in GitHub Desktop.

Revisions

  1. smiklos created this gist Jun 21, 2018.
    17 changes: 17 additions & 0 deletions dataframe-aggs.scala
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,17 @@
    val df = spark.createDataFrame(Seq(("col1", "col2", 4, 5, 7, 5),
    ("col1", "col2", 2, 0, 2, 2),
    ("col1", "col2", 2, 0, 2, 2),
    ("col1", "col1", 2, 0, 2, 2),
    ("col1", "col1", 5, 10, 3, 4)))
    .toDF("first_group", "second_group", "col1", "col2", "col3", "col4")

    df.groupBy("first_group", "second_group").min()
    .groupBy("first_group").avg()

    /*
    first_group:string
    avg(min(col1)):double
    avg(min(col2)):double
    avg(min(col3)):double
    avg(min(col4)):double
    */