Skip to content

Instantly share code, notes, and snippets.

@dyno
Created December 1, 2022 00:12
Show Gist options
  • Select an option

  • Save dyno/da9a04207cf9c3215b7204207b0a41b1 to your computer and use it in GitHub Desktop.

Select an option

Save dyno/da9a04207cf9c3215b7204207b0a41b1 to your computer and use it in GitHub Desktop.

Revisions

  1. dyno created this gist Dec 1, 2022.
    20 changes: 20 additions & 0 deletions sparkjson.sc
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,20 @@
    import org.apache.spark.sql.types.{IntegerType, MapType, StringType, StructField, StructType}
    import org.apache.spark.sql.{Column, DataFrame, Encoders}

    val DataSchema: StructType = StructType(
    List(
    StructField("A", StringType, nullable = true), // required
    StructField("B", StringType, nullable = true), // required
    StructField("C", StringType, nullable = true) // required
    )
    )

    val eventStr = """
    {"A": 1, "C": "world"}
    {"A": 2, "B": "Hello"}
    """

    val events = eventStr.split("\n").filter(_.nonEmpty)
    val df: DataFrame = spark.createDataset(events)(Encoders.STRING).toDF("data")

    df.withColumn("json", from_json(col("data"), DataSchema)).select("json.*")