Skip to content

Instantly share code, notes, and snippets.

@yashwanth2804
Last active March 21, 2019 00:55
Show Gist options
  • Save yashwanth2804/9a16fc8d3fde74111d2dd33e4317f79d to your computer and use it in GitHub Desktop.
Save yashwanth2804/9a16fc8d3fde74111d2dd33e4317f79d to your computer and use it in GitHub Desktop.

Revisions

  1. yashwanth2804 revised this gist Mar 21, 2019. 1 changed file with 5 additions and 0 deletions.
    5 changes: 5 additions & 0 deletions udf.java
    Original file line number Diff line number Diff line change
    @@ -1,3 +1,8 @@

    import static org.apache.spark.sql.functions.*;
    import org.apache.spark.sql.expressions.UserDefinedFunction;


    StructField [] sf1 = new StructField[] {
    DataTypes.createStructField("uid",DataTypes.IntegerType, true),
    DataTypes.createStructField("mid",DataTypes.IntegerType,true),
  2. yashwanth2804 created this gist Mar 21, 2019.
    28 changes: 28 additions & 0 deletions udf.java
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,28 @@
    StructField [] sf1 = new StructField[] {
    DataTypes.createStructField("uid",DataTypes.IntegerType, true),
    DataTypes.createStructField("mid",DataTypes.IntegerType,true),
    DataTypes.createStructField("rating",DataTypes.IntegerType, true),
    DataTypes.createStructField("time",DataTypes.IntegerType, true),
    };

    StructType st1 = DataTypes.createStructType(sf1);


    Dataset<Row> mv = spark
    .read()
    .schema(st1)
    .format("com.databricks.spark.csv")
    .option("delimiter", "\t")

    .csv("/home/hasura/Desktop/SparkData/u.data");



    UserDefinedFunction increaserating = udf(
    (Integer s) -> s+1,DataTypes.IntegerType
    );

    mv.
    withColumn("rating",increaserating.apply(mv.col("rating")))
    .show();