Skip to content

Instantly share code, notes, and snippets.

@farooqarahim
Created January 7, 2021 19:04
Show Gist options
  • Save farooqarahim/d79f827b72ed6113b616cd19e9eb33a8 to your computer and use it in GitHub Desktop.
Save farooqarahim/d79f827b72ed6113b616cd19e9eb33a8 to your computer and use it in GitHub Desktop.

Revisions

  1. farooqarahim created this gist Jan 7, 2021.
    22 changes: 22 additions & 0 deletions pyspark_read_csv.py
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,22 @@
    import findspark
    findspark.init()

    from pyspark.sql import SparkSession

    # Connect to Remote Spark Deployment
    # spark = SparkSession \
    # .builder.master('spark://master-node:7077') \
    # .appName("read-csv") \
    # .getOrCreate()

    spark = SparkSession \
    .builder \
    .appName("read-csv") \
    .getOrCreate()

    df = spark.read.option("header",True).csv('./csv-file.csv')

    type(df)
    df.printSchema()
    # df.show(10,False)
    df.dtypes